Slowly Changing Dimensions
Dimensional Perspectives
Information Management Magazine, September 1, 2008
The notion of time pervades every corner of the data warehouse. Most of the fundamental measurements we store in our fact tables are time series, which we carefully annotate with time stamps and foreign keys connecting to calendar date dimensions. But the effects of time are not isolated just to these activity-based time stamps. All of the other dimensions that connect to fact tables, including fundamental entities such as Customer, Product, Service, Terms, Location and Employee, are also affected by the passage of time. As data warehouse managers, we are routinely confronted with revised descriptions of these entities. Sometimes the revised description merely corrects an error in the data. But many times the revised description represents a true change at a point in time of the description of a particular dimension member, such as Customer or Product. Because these changes arrive unexpectedly, sporadically and far less frequently than fact table measurements, we call this topic slowly changing dimensions (SCDs). In this months column and the next, Ill show how to implement the three basic types of SCDs. Advertisement The Three Types In more than 30 years of studying the time variance of dimensions, amazingly I have found that the data warehouse only needs three basic responses when confronted with a revised or updated description of a dimension member. I call these, appropriately, Types 1, 2 and 3. Ill start with Type 1 this month, and I will use the Employee dimension to keep the discussion from being too abstract. Type 1: Overwrite Suppose we are notified that the Home City field for Ralph Kimball in the Employee dimension has changed from Santa Cruz to Boulder Creek as of today. Furthermore, we are advised that this is an error correction, not an actual change of location. In this case, we may decide to overwrite the Home City field in the Employee dimension with the new value. This is a classic Type 1 change. Type 1 changes are appropriate for correcting errors and for situations where a conscious choice is made not to track history. And of course, most data warehouses start out with Type 1 as the default. While the Type 1 SCD is the simplest and seemingly cleanest change, there are a number of fine points to think about: In next months column, instead of responding to a change by overwriting, Ill carefully keep track of the change by issuing a new dimension record. This is the classic Type 2 SCD. And finally, Ill show how to handle a requested change that establishes an alternate reality that coexists with the current truth. This is the Type 3 SCD.
Ralph Kimball is the founder of the Kimball Group and Kimball University where he has taught data warehouse design to more than 10,000 students. He is known for the best selling series of data warehouse "Toolkit" books. He started with a Ph.D. in man-machine systems from Stanford in 1973 and has spent the last 34 years designing systems for end users that are simple and fast. You can reach him at ralph@kimballgroup.com.
For more information on related topics, visit the following channels:






