History requirements need to be addressed when designing reporting systems.
For example, if Bob, our customer, moves five times in the last year, we can choose to capture only the most current information where Bob lives today, which is known as a Type 1 slowly changing dimension. Or we can store all history and, therefore, all of Bob's address changes, which is known as a Type 2 SCD. Or we can store some of the history, such as Bob's original or previous address, which is known as a Type 3 SCD. Type 3 is frequently chosen for purely technological reasons, such as simplifying development, saving storage space or increasing query performance. Is the decision to store some history (Type 3) over all history (Type 2) ever driven by business needs, or is it always dictated by technology? In other words, what business reasons are there for modeling a Type 3?
Most Design Challengers agree that choosing a Type 3 is a technology-driven decision by IT. Business requirements, however, can also favor Type 3 for these reasons:
Choose Type 3 when Type 2 can violate legal regulations or privacy rights. Within certain industries, storing substantial history for some subject areas can violate laws. Often these regulations and privacy rights have to do with knowing too much about an individual. Doug Jones, information architect, gives credit history as one example where too much history may not be required or allowed. Jeff Dallara, data management lead, includes mortgage applications as an area where only the current and previous addresses may be permitted in gauging the mortgagee's financial stability.
Choose Type 3 when Type 2 can produce meaningless results. Sometimes too much history can be confusing or worthless to the business. If Type 3 has been chosen for a particular subject area, it is possible that other subject areas related to the one chosen will also need Type 3 because Type 2 would produce meaningless results. Building on the performance rating example mentioned earlier, if we are only storing three years of performance ratings, it may not be of value to store full history on all of the assignments the employee has had. Architect Vibha Shrivastava provides this manufacturing example: "In one situation when I was creating a data warehouse for a steel manufacturing company, they needed the original heat of the material and the current heat of the material. It was immaterial what other heats the steel would have had in the past." Jennifer Prichard, data modeler, provides this HR example: "The only business reason for this that I can think of in general terms is a piece of data that only exists in two states: current and past. One I can think of is marriage status - single, married, etc. A current/past value would be useful for seeing if someone was ever married at a glance, and dates could be used to track multiple events."
Choose Type 3 if the values never change more than once. Building upon the previous example of HR marriage status indicators, the person's last name may change only one time (though exceptions exist), and therefore, would be good for Type 3. Chan Beauvais, database analyst, provides another example: "At our firm, we need to know if a location has relocated, and if so, what was the previous location number. This is used for comparative tracking of the previous and new location sales data. So we only need the current and previous location numbers."
It is rare to choose Type 3 for business reasons, yet the examples offered are proof that it can happen and that there are exceptions to every rule.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access