The horizontal approach is a traditional way of storing data in relational databases. Each new data record is inserted as a row and table columns run horizontally. Predetermined structure and simple access to data are among the numerous advantages of this design. Possibly the most difficult problem this structure faces is appropriate storing of hierarchical data. In a hierarchical data model, data is organized into a tree-like structure. The perfect example is a questionnaire, such as the one shown in Figure 1, widely used in virology, oncology, ophthalmology and neuroscience. Representing this type of data in standard horizontal format causes it to lose flexibility and logical purity.
The typical questionnaire contains categories, main questions, secondary questions and numerous sub-questions. Representing this kind of data using vertical data structure will allow it to have only two core columns (Figure 2). So, data stores as key/value pairs vertically. In the case of questionnaires, two core columns would be: question and answer. Other columns are considered to be attributes for these two primary ones.
A drawback of the vertical table is that it is difficult to work with. To get the same data from a vertical table, one would need to do a self-join and a number of nested subqueries. Some developers write special functions or stored procedures to convert from vertical to horizontal structure so that they can do certain reports.
Tom Haughey, in his work Modeling Hierarchies, suggests adding the following optional attribute columns:
- Level number , indicating at what level this occurrence is.
- Depth attribute, indicating the distance of this entry from the top (or bottom).
- Max levels attribute, indicating the total number of levels for any branch of the tree.
- Sequence number, indicating the preferred order of retrieval.
While working on database design for clinical studies, it is essential to remember that data constantly changes. Databases should be designed to accommodate these changes in a timely fashion without compromising data quality and integrity. New categories and subquestions might be added at any time during the study. That is why it is hard and useless to determine max level and depth attributes in clinical studies. Even though any minor changes and updates in clinical trials require proper documentation, it is very time-consuming and costly to maintain unnecessary extra data columns. On the other hand, sequence and level number are especially important attribute columns. A sequence number within level number allows the data to be pulled off sequentially within the level.
From the example shown in Figure 2 you can see that using a horizontal database structure for questionnaire annotation will not allow to query questions by category as well as by question level. While the Vertical structure allows the output data related to a certain category only, e.g., self-maintenance:
Select * from Table where Q_categ = 2
Almost the same result from a horizontal structure can be obtained by using the following query:
Select PatientID, Visit, A3, A3_d, A4, A4_d, A5 from Table
If new questions will be added to the self-maintenance category, the first query will not require any changes to get the correct output. However, the horizontal structure will require updates in the database as well as in the select statement. The horizontal structure also will not allow a to differentiate main questions from subquestions automatically that might be critical for data manipulation and reporting.
The Clinical Data Interchange Standards Consortium (CDISC) implemented alternative representation of ECG and vitals as a vertical format in order to support FDA activities to pilot new database and data-viewing technologies starting from version 2.0. The main reason for this was greater flexibility in terms of data storage and manipulation. Another good example of attribute (non-core) columns to be used in clinical studies is a variable added by CDISC for the LOINC codes to lab, ECG and vitals measurements. LOINC is used by most laboratories and diagnostic systems. It follows a good coding system and has been adopted widely for delivering laboratory results that identify cases or reportable conditions to public health departments.
The option of adding almost any attribute columns in vertical format provides endless opportunities in terms of automatic data manipulation, reporting and analysis for complex studies. A perfect example is question 4 from the Quality of Life Questionnaire shown in Figure 1:
What else do you do yourself? 1. Cooking 2. Cleaning 3. Bathing 4. Other _________________
These type of questions are designed to allow multiple answers to the same question. Respondents might select just one or any combination of options as their answer. To accommodate all possible answers in horizontal format, you should add additional columns for each optional answer. In the case of a single positive answer, the other three fields will just be blank or answered no. When data is loaded into the database, it requires data cleansing efforts to determine if blank fields happen due to data transfer errors or to lack of an answer by the respondent. Another challenge one faces is proper data storage and documenting. If the answer for this question is supposed to be stored in the database as A4, (see Figure 2) what is the best way to name the following three columns to preserve data integrity?
This is when, one starts thinking of vertical database structure, which has only two core data columns (Figure 3). A vertical format will provide you with a simple and elegant solution. Multiple answers to the same questions will be stored as additional rows, which will be differentiated only by answer code: no other changes required.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access