When you are developing your fact groups, it is critical that you make sure you have clean, single grain fact groups. This means that there is only one level of detail represented from each dimension that is applicable to these facts. If you will have fact table rows that represent different levels of detail, such as daily and weekly, you need to split these into separate fact groups.
One situation that you may come across is that you have some fact rows in which one or more dimensions do not have corresponding values. This is a violation of the referential integrity between the dimension and the fact table. Your first response should be to conduct further detailed research. Ask the following questions:
- Do missing values indicate an error that needs to be addressed?
- Will there be missing data only until adjustments and corrections flow through the staging process over the next several days?
- Will the dimension values be missing because the field has optional input in the operational system?
- Is the dimension not relevant for this fact from a business perspective?
- What percent of the rows do not have a value for this dimension?
- Are all of these fact rows used together?
- Do you need all of these fact rows to represent the total business picture?
Let's review a simple sales example to highlight the design alternatives. This is a basic retail sales scenario looking at customer sales. Some items are purchased on promotion, while others are not. There are two design options to consider.
Your first thought should be that these fact rows must be split into two separate fact tables because the dimensionality is different. Therefore, we would have a table containing all the items that were sold on promotion and a separate table containing all the items that were not sold on a promotion (see Figure 1).
Figure 1: Table of Items not Sold on a Promotion
However, we must consider that most users are interested in all sales together. This requires that regular and promo facts be summed to get the total sales. Using a powerful business intelligence tool, it is relatively easy to create business measures in advance for the users. This would keep them from forgetting to add the two different sales figures together. Unfortunately, you are still using two fact tables, and this could impact performance. Using multiple fact tables is not something you should avoid, but if you can reduce the number of tables needed in common queries (without compromising the grain), you should do so. This should also improve query performance (which is what we all do when using relational database technology for any application).
In this case, it would be beneficial to merge these two fact tables together. This can be done ONLY if you use placeholders or proxy values in the Promotion Dimension. We include a special row in the Promotion Dimension to represent the non-promoted sales. An excerpt of the promotion dimension is shown in Figure 2.
Figure 2: Promotion Dimension
Because you are using a surrogate key (nonmeaningful keys assigned in the staging process that are independent of any operational system keys), you can set aside one value that you will use as the placeholder, where there is no promotion. You see that the row with PROMO KEY of 4000 is used as the placeholder or proxy. This same promotion key is used for all fact table rows that were regular, non-promoted sales. The combined structure is shown in Figure 3.
Figure 3: Combined Structure
Also note that the fact columns are also simplified. This keeps us from having rows with mutually exclusive fact columns. To get all promoted sales, simply constrain on the Promotion Dimension where the Promo Name is not "No Promo" or Promotion Type is not "None." Likewise, all regular sales are where the Promo Type is "None." You must include the placeholder values. Do not leave a null in the foreign key for the promotion dimension in the fact table! Rows are included and excluded in nonobvious ways if you allow nulls in any dimension key on the fact table (more details about this would require an entire column - perhaps in the future).
The grain for each of the facts remains the same. We only have one level of detail from each dimension. This allows us to simplify the overall structure and keep data together that will be used together frequently. Remember, we are dealing with whether or not an entire dimension applies, not a difference in grain within a dimension.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access