Dimensional modeling is an old discipline, dating from the late 1970s when
ACNielsen introduced its In*Fact syndicated data reporting service, organized
around dimensions and facts. It is, therefore, surprising that some
consultants and industry pundits consistently state myths and misrepresentations
about dimensional modeling that have been debunked multiple times. It is time
(once again) to address these myths. Myth: A dimensional view could be missing key relationships that exist
only in a true relational view. Myth buster: This is perhaps the best place to start debunking dimensional
modeling misrepresentations and myths. A dimensional model contains all the
data relationships that a normalized model contains. There is no data
relationship expressible in a normalized model that cannot be expressed in a
dimensional model. Note that dimensional models are fully relational. Fact
tables are generally in third normal form and dimension tables are generally in
second normal form. The major difference between the two approaches is that the
many-to-one relationships in the dimensions have been denormalized to flatten
the dimensions for user understandability and query performance. But all the
data relationships and data content are otherwise identical. Myth: A very real issue with a dimensional enterprise data model (EDM) is
the possibility that the model may not be extensible and easily accommodate
changing business needs. And also: Although a logical representation of
the business can be achieved using dimensional structures, using these
structures could have negative effects on extensibility and industry data
integration. Myth buster: This myth about extensibility is a strange one; dimensional
models are significantly more robust than normalized models when data
relationships change. For more than 10 years, we have been teaching the
graceful extensibility of dimensional models. There are five types of change
that have no effect on applications running on dimensional models: 1) adding a
new dimension to a fact table; 2) adding a new fact to a fact table; 3) adding a
new dimension attribute to a dimension table; 4) altering the relationship of
two dimension attributes to form a hierarchy (many-to-one relationship); and 5)
increasing the granularity of a dimension. In the normalized world, such changes
often involve altering the relationship between separate tables. Altering the
relationship between tables exposed to the BI tools forces recoding of
applications. With dimensional models, the applications keep on running without
the need to recode because the dimensional schemas are inherently more robust
when confronted with new content and new business rules. Myth: A dimensional model by its definition is built to address a very
specific business need. And also: Relational modeling mimics business
processes, and dimensional modeling captures how people monitor their
business. Myth buster: A dimensional model is built in response to a measurement
process, never a specific business need or a desired final report for a specific
department. A fact record in a dimensional model is created as a 1:1 response to
a measurement event in a specific business process. Fact tables are defined by
the physics of the real world. Our job as modelers is to carefully understand
the grain of the physical measurement event and to faithfully attach facts and
dimensions to that event that are true to the grain. A dimensional model
satisfies a business need only if the business happens to need the measurements
events represented in the fact table. The format and content of a dimensional
model has no dependence on a final report desired by the end users because it is
determined only by the physics of the measurement process. A dimensional model
is never crafted to meet the needs of a specific department, but rather is a
single representation of a business process that looks the same to all
observers. Myth: In a dimensional model, usually only one date is associated with
time. The other dates (e.g., in an order) are not captured, and, therefore,
valuable data can be lost. Myth buster: If you understand the previous myth buster, then you can
appreciate that a measurement involving a line item on an order will naturally
expose many dates. Each of these dates is represented by a foreign key to a copy
or view of the date dimension. I first described this technique of using
dimension roles in a May 1997 DBMS magazine article, listed at http://www.kimballgroup.com/.
Role-playing dimensions are an old standard dimensional modeling technique we
have described hundreds of times. Myth: Relational is preferred because an EDM should capture data at a
very low granular level - preferably individual transactions. Myth buster: From the very beginning, I have urged designers to capture
measurement events in fact tables at the lowest possible (e.g., transaction
grain). In my 1996 book, The Data Warehouse Toolkit, I wrote, A data warehouse
almost always demands data expressed at the lowest possible grain of each
dimension, not because queries want to see individual records, but because
queries need to cut through the database in very precise ways. If I have
consistently urged dimensional models to be built at the most expressive
granular grain for the past 11 years, through 250,000 books, more than 100
articles and 10,000 students in my classes, where do people come up with myths
like this? Stepping back from these specific myths, I urge you to think critically. When
you read or hear strong statements, circle around the issues and educate
yourselves. Challenge the assumptions. Look for defendable positions, detailed
logic and clear thinking. I expect to be held to such high standards, and I hope
you will do the same to others.
Ralph Kimball is the founder of the Kimball Group and Kimball University where he has taught data warehouse design to more than 10,000 students. He is known for the best selling series of data warehouse "Toolkit" books. He started with a Ph.D. in man-machine systems from Stanford in 1973 and has spent the last 34 years designing systems for end users that are simple and fast. You can reach him at ralph@kimballgroup.com.















Be the first to comment on this post using the section below.