Subtyping is a powerful communication tool on the logical data model because it allows the modeler to represent similarities that may exist between distinct business concepts to improve integration and data quality. As an example, refer to Figure 1 where the supertype Course contains the common data elements and relationships for the Lecture and Workshop subtypes:
Figure 1: Logical Course Structure
Briefly walking through this model, we learn that a Course such as Data Modeling 101 can be a Lecture or a Workshop. Data Modeling 101 can be taught at one or more Locations. A Learning Track, such as the Data Track, must consist of one or more Lectures, yet only one Workshop. A Workshop, such as the Advanced Design Workshop, can require certain Courses as prerequisites, such as Data Modeling 101 and Data Modeling 101 Workshop.
This is a logical data model (with data elements hidden to keep this example manageable). On the physical data model, we can replace this subtype symbol with one of three options:
- Rolling down: Remove the supertype entity and copy all of the data elements and relationships from the supertype to each of the subtypes.
- Rolling up: Remove the subtypes and copy all of the data elements and relationships from each subtype to the supertype. Also add a type code to distinguish the subtypes.
- Identity: Convert the subtype symbol into a series of one-to-one relationships, connecting the supertype to each of the subtypes.
Using this Course structure as example, what are the pros and cons of each of these three options?
Figure 2 shows the rolling down technique applied to the Course structure, Figure 3 the rolling up technique and Figure 4 the identity technique. Challengers identified six factors to consider in selecting a technique. Figure 5 indicates the pros and cons for each factor for each technique. A discussion for each factor follows.
Figure 2: Rolling Down
Figure 3: Rolling Up
Referential integrity (database enforcement of rules) was mentioned frequently in Challenger comments. The logical business rules are compromised to a degree in each technique, with identity most closely mimicking the business rules. However, the one-to-one relationships on the identity model capture that a Course can be a Lecture, a Workshop or both. Therefore, as long as the logical model contains an overlapping subtype, the translation from logical to physical preserves the rules. The rolling down technique can enforce only those rules present in the subtypes. Monica Oliver, consultant, says rolling down and identity minimize null columns. If Workshop Difficulty Code is a mandatory data element in Workshop, we can enforce that it be populated. Rolling up, on the other hand, retains the supertype-level business rules. This structure retains the business rules from Course to Location. However, as Doug Jones, IT manager, states, "The rolled up relationship from Learning Track loses much business rule information regarding the required mix of Lectures and Workshops."
Figure 4: Identity
Balance refers to the quantity of data elements and relationships that are present in each of the subtypes and the supertype. The rolling down technique is an effective choice when the subtypes are weighted down with their own unique data elements and relationships. That is, the subtypes are more different than they are similar. The rolling up technique on the opposite end is a good choice when the subtypes share most of their properties. The identity technique is effective regardless of the balance of elements and relationships.
Usability means the model is easy to understand. An analyst generally finds it easier to interpret or navigate a data model when the model contains concrete business terms. Both the rolling down and identity techniques preserve the subtypes Lecture and Workshop. The rolling up technique, on the other hand, only contains the generic term Course and, therefore, could impact the communication ability of the model.
Performance in retrieving data is a major concern on the physical design. Doug Jones commented that the rolling down structure reduces joins across tables. "Workshop and Lecture both stand on their own. It saves the work of building and maintaining a third table." However, if there is a need to create a Course concept, joining across these tables could be complex and error prone. Data modeler Michelle Viljoen says, "There is no single table (Course) where a thin slice of all common course data is stored." The rolling up technique would have a similar issue if most queries were performed on Lecture and Workshop. The identity approach can lead to a performance hit, as you have to join across tables to create a single Lecture or Workshop view.
Figure 5: Factors to Consider for Each Technique
Scalability refers to how easily the data model can accommodate changing requirements. Abstract supertype structures, such as Course, are fairly scalable in the roll up and identity approaches, as new course types can leverage the existing course structure. Georgia Prothero, data modeler, points out, "The same application code can be used to process both workshops and lectures." The rolling down technique isn't poised for future growth. Abel Mendez, business intelligence specialist, comments, "Similar attributes and/or relationships added later may be implemented differently, further distancing the subtypes from their original logical model."
Storage space is usually less of an issue than performance. The identity technique is the most storage efficient. Tim Klein, data warehouse architect, says, "Identity has fewer null values, making for more efficient storage." The rolling down technique could lead to repeating the same data element values if the same Course is both a Lecture and Workshop. The rolling up technique also is not ideal in terms of storage space as some data elements and relationships will be left empty (null) because they only apply to one of the subtypes. Deb Kesse, senior modeler, describes the many empty data elements at the supertype level as the "Swiss cheese effect." This is a great visual, but it does make me hungry!
When deciding which of the three techniques are most appropriate for your model, you will need to prioritize the six factors in terms of model impact and give the highest-priority factors the most weight in making your decision.
If you would like to become a Design Challenger or submit a modeling challenge or solution, please add your email address at www.stevehoberman.com and submit a description of the scenario.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access