Free Site Registration

As Low as You Can Go

Subset//data

Information Management Magazine, March 2009

Steve Hoberman

A modeler on your team is building a consumer complaint dimensional model. The business requirement is to allow users to view the number of complaints at different levels of detail, the lowest level required being Region, Month and Brand. The modeler, however, is unsure whether more detail might be required in the future and has built the model at the lowest level which is Consumer, Day and Product (see Figure 1). An obvious advantage to this model is that it can answer more detailed questions. What are the other advantages to a dimensional model being at its lowest level? What are the drawbacks?

Taking it to the Next Level

The main advantage to adding more detail is that the users will be able to answer more specific questions. Database Engineer Steve Turnock summarizes: "User questions are unpredictable, and if you apply business rules to summarize the fact table, you have lost the ability to answer new user questions." For example, after Mary learns that there are 50 complaints on the CUSoon Sunglass Brand in December 2008 from the northwest, she might drill in to more details to learn which product within this brand is generating the most complaints, or which day within December 2008, or which city within the northwest.

Advertisement

Other advantages of providing the lowest level are:

  • Greater consistency with other applications. Data Architect Mark Taylor explains this advantage: "The dimension tables will be more reusable with other fact tables. The three dimensions in the example are all highly likely to be conforming dimensions within a wider data model, so modeling them at the lowest level now will save amending their design later."
  • Faster load time. Data Architect Bob Mosscrop III mentions that the data load time could be faster because the loading process is doing more extract and load, and less transforming. That is, there is little or no processing time spent summarizing and sorting.
  • Greater query freedom. Rakesh Nandish, information architect, and Don McMunn, consultant, both consider the ability for users to play in the detailed data to be a large advantage. Don says it "supports better coverage for information mining and ad hoc data analysis." 
  • Fresher data. Data Modeler Jan Cohen thinks more granular data usually means fresher data because of daily updates.
  • Easier restatements. Senior Technical Architect Sai Koduri emphasizes a very big advantage is having the ability to restate and recreate history if business rules change, because we will be storing all of the atomic building blocks.

However, just like there is really no such thing as a free lunch, there are also drawbacks to providing the lowest level:

  • More data quality issues. The more data we bring in, the greater the chance becomes we'll get bad data. Phrasing this in a more positive tone, Don says, "More quality assurance steps and processing will be required to keep the additional detail synchronized." 
  • More complexity. A dimensional model should be user-friendly, and with more detail comes more complexity in design.
  • Retrieval performance concerns. More data could lead to more time spent retrieving data. Data Architect Mark Taylor says, "The lowest level of detail demands the highest level of storage space. Additionally, assuming we are not providing any summary tables, all summarized reporting will require on-the-fly aggregation calculations, which will be more processor intensive, and therefore, query performance could be an issue."
  • Political hurdles. Solutions Architect Murali Vishnuvajhala comments, "At times it can be challenging to present to the IT manager why I need more resources when the current business questions can be answered with less resources."

Senior Database Architect Wade Baskin recommends you consider the reasonable level of granularity that reflects your business situation. "In retail for example, does it make sense to track the transaction down to the individual product, checkout lane and price point, when all your business needs is the amount of a particular product sold, at a store, for a particular day?"

Steve Hoberman is one of the world's most well-known data modeling gurus. He taught his first data modeling class in 1992 and has educated more than 10,000 people about data modeling and business intelligence techniques since then. Steve is known for his entertaining, interactive teaching and lecture style (watch out for flying candy!), and organizations around the globe have brought Steve in to teach his Data Modeling Master Class, which is recognized as the most comprehensive data modeling course in the industry. Steve is the author of "Data Modeling Made Simple," "Data Modeler’s Workbench" and "Data Modeling for the Business (Technics Publications). He is the founder of the Design Challenges group and inventor of the Data Model Scorecard.

For more information on related topics, visit the following channels:

Advertisement

Advertisement