This is the third in a series of articles on things to look for in evaluating a data warehouse data model. A good data warehouse data model needs to accommodate data that does not age gracefully.

Data Warehouse Design

A data warehouse designed for analyzing sales information is often called upon to provide trends by sales area. The problem that the business faces is that the sales area boundaries and hierarchies are often volatile, and the data warehouse is tasked with retaining historical information. The solution to this problem requires consideration of both the granularity level at which data is captured and the hierarchical relationships that are retained.

The granularity level in the data warehouse is the lowest level of detail that is needed to support the strategic decisions. The lowest level of analysis and the level of granularity are not necessarily the same. The level of granularity needs to consider the analysis being performed as well as the volatility of the data. If the sales areas and their hierarchical relationships change often, data cannot be maintained only at the sales area level. Information must be available to rebuild the sales areas based on the analysis being performed.

For illustration purposes, let's assume that, even though the sales area boundaries are volatile, they always align with state borders. When data is summarized, the summary basis needs to be at the state ­ not the sales area ­ level. Often, data is retained at the sales transaction level, with attributes such as the customer address being omitted in the data warehouse. (Customer address is not typically needed for strategic analysis.) Even if the customer address is left out, the state should be retained to enable the recreation of sales areas. In addition, an associative entity, relating the state and the sales area also needs to be created. This entity would include the effective date within its primary key.

The approach described can be used to support different types of requests. If a company needs to analyze sales trends by sales area, using the current view of the sales area, the sales are allocated to each sales area based on the current relationship between the state and the sales area. Sometimes sales need to be allocated to the sales area in effect at the time of the sale. To determine the best design for the data warehouse, the analyst needs to determine the frequency of these requests. If the requests are rare, then the approach described earlier should be used. If, however, the requests are common, then in addition to the associative table, the analyst should consider storing the sales area as an attribute of the sale or of the sale summary.

The associative entity can also be used to support analysis aimed at determining the sales area boundaries or at anticipating the sales volume for future sales areas. To accommodate this, a subsystem can provide future relationships between the states and the sales area for loading into the associative table. By using the future effective date as the basis of analysis, information becomes available to project sales for the new sales areas.

Data Mart Design

The associative table within the data warehouse provides the needed linkages. Structuring requests that use the table, however, exceeds the capabilities of most casual users. Their needs are better served by a data mart that considers how they need to view the information.

A data mart is designed to meet the functional needs of a business area. Several data marts may be used to perform the analysis. In designing the data mart, a key question to be answered is which view of the sales area boundaries is needed. If the current view is always used, and no analysis is done at the state level, the dimension can consist of the current sales hierarchy, with the lowest level sales area identifier serving as its primary key. The data in the warehouse supports populating both this dimension table and a fact table summarized at this level.

If the analysis requires a historical view of the information, two approaches are feasible. The traditional approach entails a date within the sales area dimension table primary key. With this date, the sales area hierarchy that existed at the time of the sale would be joined with the fact table. Another approach is to populate the hierarchy at the time of the sale as non-fact, non-key attributes of the fact table. The audience of the data mart, the frequency of the different types of requests, the volatility of the sales area hierarchy and the impact the design would have on reusing the dimension table are all factors that should be taken into consideration. As with the earlier example, the data warehouse supports populating the data mart with the required data.

The data warehouse and data mart must be able to support business queries even if data does not age gracefully due to changes in hierarchical relationships. The data analyst needs to understand the business rules affecting the hierarchies and must incorporate these within the data warehouse data model using relational modeling techniques. This may involve an associative entity and the inclusion of additional attributes. The complexity created in the data warehouse model needs to be eliminated in the transition to the data mart. In reviewing a data warehouse or data mart model, we strive to be sure that the analyst correctly interpreted the business rules and that the design considers both the type and frequency of the anticipated business questions.

Part 1 from the November 1998 issue
Part 2 from the December 1998 issue

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access