As I have discussed in the past, it is critical that you design and build single grain fact tables. This requires that there is one and only one level of detail represented in the fact table for each dimension. This also means that there must be a value in every row for every dimension key - no nulls are allowed! Also, the logical primary key of a fact table is the collection of foreign keys from your dimensions. A physical primary key is the collection of columns that will uniquely identify a row in a table. Most of the time, the logical and physical primary keys will be the same. However, they do not need to be.
Consider a basic purchase order (P.O.) scenario. The purchase order is described by customer, order date, requested ship date and sales representative. These are the dimensions. To uniquely track the individual order, the purchase order number is also a dimension. There would be several fact columns as well. See Figure 1 for an example of the purchase order fact table.
The purchase order number is considered a degenerate dimension because all of the interesting attributes about an order have already been placed into another dimension. This leaves only the purchase order number itself. This should be handled in the business dimensional model as a full-fledged dimension, but does not require the creation of a separate dimension table. You may still create a physical degenerate dimension table if your data access tool does not recognize it.
There are additional facts associated with a purchase order, but they exist at a different grain or level of detail. There are facts at the line item detail, which describe the products that are being purchased. Looking simply at the relational tables, we can uniquely identify a line item with the purchase order number and product (see Figure 2).
This would allow unique identification of a row in the purchase order line item table, but it would not easily support the business users for analysis. If you wanted facts for a specific product that was ordered in January, you could get the answer, but not directly. You would need to access both fact tables and join the results on the purchase order number. Most analyses would require using both fact tables. This is not an optimal situation.
You can eliminate that layer of complexity by carrying all of the purchase order dimensions down to the line item fact table. This is illustrated in Figure 3.
Usually, adding dimension keys to a fact table changes the grain of the table and increases the number of rows. In this instance, it does not change the number of rows or the grain. Suppose that you have 5 million rows in the line item fact table in Figure 2. After we add the dimensions in Figure 3, we still only have 5 million line item rows. Each row is fully described with these added dimensions. The size of the fact table does increase. We have added four additional columns. At four bytes each, we have an increase of 16 bytes for each row in the table.
Figure 3 illustrates the difference of the physical primary key of a fact table from the dimensional logical primary key. The physical primary key is comprised of purchase order number and product key. The logical primary key (how we view and use the table dimensionally) is comprised of the order date key, requested ship date key, customer key, sales rep key, P.O. number and product key.
We should also consider the need to retain the purchase order fact table itself. Review each of the facts to determine:
- Does this fact represent an aggregate that can be created by summing facts from the purchase order line item table? If so, simply eliminate the fact from the purchase order fact table.
- Can this fact be allocated or distributed across each of the individual line items? If so, then create the new allocated fact during the transformation process. The allocated fact is placed on the line item fact table and eliminated from the purchase order table itself.
If all of the facts can be taken down to the line item level of detail, then the purchase order fact table is no longer needed!
There are many legitimate instances where the purchase order level facts cannot be allocated down. Consider the case where a hardware store offers a customer a coupon for $10.00 off a total purchase of $100.00 or more. There is a big sale on drills, and the retailer is selling them with a very low profit margin. The customer purchases:
Garden Gloves $9.00
Gallon of Paint $18.00
Flower Seeds $1.50
Vegetable Seeds $1.50
Total Goods $104.00
Total Due $94.00
Therefore, the customer pays $94.00. To allocate the coupon across the products, a simple average would result in $2 off the vegetable seeds, which is more than the item cost. A weighted average seems the next logical step but can impact the profitability measurements of the retailer's merchandising staff. If the drills were on sale close to the retailer's purchase price, the additional $7.12 allocated to the drill could drive the price below what the retailer paid for the item. Total transaction coupons are often accounted for and analyzed separately. If this is the business case, then you must retain the purchase order fact table. How would you know if you can allocate these facts? Ask the business!
As you have seen, there can be a difference between the physical primary key and the logical dimensional primary key of a fact table. Including additional dimensions that further describe the facts can expand the types of analyses that are possible.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access