Translating Your Fact Groups into Tables - The Basics
Last month, we looked at the basics of translating a business dimension into a logical dimension table. This month, we will look at the basics of translating your fact groups into tables. We know that every fact included in a fact group has identical dimensionality and grain for each of those dimensions. We now need to design the logical fact table structure.
Each fact group becomes one or more fact tables. You should always begin with the assumption that each fact group will translate to a single fact table.
Create a primary-foreign key relationship between dimensions and the fact table. As we learned last month, each dimension will have a single column surrogate key. This is considered the primary key for the dimension. For the fact table, we need to include this dimension primary key as part of the key structure for the fact table. Beginner's note: When we use a primary key as a link to another table, it is called a foreign key in the second table.
The logical primary key of the fact table is comprised of all the foreign keys from the dimension tables included in that fact group. Using the sales example shown in Figure 1, the logical primary key for the fact table is the composite of the foreign keys from the Date, Customer, Store, Product and Promo dimensions..
Figure 1: Sales Fact Group
Each business fact becomes a fact column. Now that we know the key structure for our fact table, we simply create a column for every fact that is in the fact group (see Figure 2).
Figure 2: Sales Fact Table
You will need to repeat this design process for every fact group in your business dimensional model.
Evaluate storing derived facts as columns. The fact group design must capture the base facts that are captured by the underlying systems. While modeling, you may also identify additional business measures or calculations. These are sometimes called derived facts. You will need to decide if you want to store these calculated values on the fact table itself. The factors to consider include:
- How complex is the calculation? (How difficult would this be to calculate on the fly? How long would it take to calculate on the fly?)
- How often is the derived fact used and by how wide an audience?
- Is the resulting calculation semi- or non-additive?
Many calculations are simple and do not need to be stored. Often, there are three facts that work together, such as sales units, unit price and sales amount. You only need to store two of these. Choose the two that are in the format most frequently used (in our example, sales units and sales amount are used most frequently). You can then quickly calculate unit price.
Very lengthy, complex calculations are good candidates for storing as fact columns. If the resulting calculation is semi- or non-additive, consider precalculating the numerator and denominators for the calculation. These tend to be additive and are used as stepping-stones to facilitate calculations performed on the fly.
When would a fact group become more than one fact table? First, you need to keep in mind that you should never stray from the basic design principles without a sound reason. The fact that you have always designed tables a certain way is not a valid reason. In this instance, the factors you need to consider include:
- How often are the facts used together for analysis?
- Do the facts come from a single data source?
- When are the facts available?
- Are the facts useful if loaded at different times? For example, if five of the facts are available daily, but the other 10 are not available until the end of the week, are the first five facts of any business value until the end of the week?
- How easily does your data access or business intelligence tool support multifact table access?
- What are the parameters of your load window?
When you have answers to these questions, you can make some decisions. If these facts are not used together, then it will not matter if you create one, two or three separate fact tables. If the data comes from different sources, consider creating one fact table from each source.
If the facts are used together, then you still need to consider alternatives. If your data access tool does not handle multiple fact table access easily, you should try as hard as you can to keep facts together.
If the facts are only useful if they are all available, design one fact table and load it only when all of the facts are available.
If the facts are needed as soon as they are available and are ultimately used together, you must weigh the pros and cons. You must achieve a balance between user query performance, ease of use and load manageability. If you can't get the data loaded, query performance does not matter!
The benefits of a single fact table come from usability and query performance (because only one fact table is accessed), and disk space is conserved because the fact table keys are stored only once. The benefit of splitting the fact group into multiple tables is that the load process does not need to perform updates to existing rows in a fact table.
Since starting this column, I have shared the basics of the business dimensional modeling technique and how to translate the business dimensions and fact groups into logical table structures. In future columns, I will begin to address many of the interesting situations that arise and how they make modeling so much fun!
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access