Dimensional modeling is an old discipline, dating from the late 1970s when ACNielsen introduced its In*Fact syndicated data reporting service, organized around dimensions and facts. It is, therefore, surprising that some consultants and industry pundits consistently state myths and misrepresentations about dimensional modeling that have been debunked multiple times. It is time (once again) to address these myths.
Myth: A dimensional view could be missing key relationships that exist only in a true relational view.
Myth buster: This is perhaps the best place to start debunking dimensional modeling misrepresentations and myths. A dimensional model contains all the data relationships that a normalized model contains. There is no data relationship expressible in a normalized model that cannot be expressed in a dimensional model. Note that dimensional models are fully relational. Fact tables are generally in third normal form and dimension tables are generally in second normal form. The major difference between the two approaches is that the many-to-one relationships in the dimensions have been denormalized to flatten the dimensions for user understandability and query performance. But all the data relationships and data content are otherwise identical.
Myth: A very real issue with a dimensional enterprise data model (EDM) is the possibility that the model may not be extensible and easily accommodate changing business needs. And also: Although a logical representation of the business can be achieved using dimensional structures, using these structures could have negative effects on extensibility and industry data integration.
Myth buster: This myth about extensibility is a strange one; dimensional models are significantly more robust than normalized models when data relationships change. For more than 10 years, we have been teaching the graceful extensibility of dimensional models. There are five types of change that have no effect on applications running on dimensional models: 1) adding a new dimension to a fact table; 2) adding a new fact to a fact table; 3) adding a new dimension attribute to a dimension table; 4) altering the relationship of two dimension attributes to form a hierarchy (many-to-one relationship); and 5) increasing the granularity of a dimension. In the normalized world, such changes often involve altering the relationship between separate tables. Altering the relationship between tables exposed to the BI tools forces recoding of applications. With dimensional models, the applications keep on running without the need to recode because the dimensional schemas are inherently more robust when confronted with new content and new business rules.
Myth: A dimensional model by its definition is built to address a very specific business need. And also: Relational modeling mimics business processes, and dimensional modeling captures how people monitor their business.
Myth buster: A dimensional model is built in response to a measurement process, never a specific business need or a desired final report for a specific department. A fact record in a dimensional model is created as a 1:1 response to a measurement event in a specific business process. Fact tables are defined by the physics of the real world. Our job as modelers is to carefully understand the grain of the physical measurement event and to faithfully attach facts and dimensions to that event that are true to the grain. A dimensional model satisfies a business need only if the business happens to need the measurements events represented in the fact table. The format and content of a dimensional model has no dependence on a final report desired by the end users because it is determined only by the physics of the measurement process. A dimensional model is never crafted to meet the needs of a specific department, but rather is a single representation of a business process that looks the same to all observers.
Myth: In a dimensional model, usually only one date is associated with time. The other dates (e.g., in an order) are not captured, and, therefore, valuable data can be lost.
Myth buster: If you understand the previous myth buster, then you can appreciate that a measurement involving a line item on an order will naturally expose many dates. Each of these dates is represented by a foreign key to a copy or view of the date dimension. I first described this technique of using dimension roles in a May 1997 DBMS magazine article, listed at http://www.kimballgroup.com/. Role-playing dimensions are an old standard dimensional modeling technique we have described hundreds of times.
Myth: Relational is preferred because an EDM should capture data at a very low granular level - preferably individual transactions.
Myth buster: From the very beginning, I have urged designers to capture measurement events in fact tables at the lowest possible (e.g., transaction grain). In my 1996 book, The Data Warehouse Toolkit, I wrote, A data warehouse almost always demands data expressed at the lowest possible grain of each dimension, not because queries want to see individual records, but because queries need to cut through the database in very precise ways. If I have consistently urged dimensional models to be built at the most expressive granular grain for the past 11 years, through 250,000 books, more than 100 articles and 10,000 students in my classes, where do people come up with myths like this?
Stepping back from these specific myths, I urge you to think critically. When you read or hear strong statements, circle around the issues and educate yourselves. Challenge the assumptions. Look for defendable positions, detailed logic and clear thinking. I expect to be held to such high standards, and I hope you will do the same to others.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access