The logical dimensional model shown in the figure below was built for a book publisher to track sales. There is a need to track sales by calendar, by book and by location.
Sales is the fact table, and Calendar, Book and Location are the dimensions.
Here is the design challenge. The model in the figure can answer questions such as, "What was our sales on the book Data Modeling for the Business in May 2009 for the Northeast?" Notice in the Book dimension there is a data element called Book Cost Amount. Should Book Cost Amount reside in the fact table instead of a dimension? Why or why not?
Design Challengers State the Facts
In a logical dimensional model, facts, such as Sales Amount in this example, need to be able to be mathematically manipulated, such as being summed. Although facts in a dimensional model need to be summed, not all data elements that can be summed are facts. Office Square Footage in an Office Dimension, for example, describes Office and can be mathematically manipulated, yet Office Square Footage is not a data element that I would consider a fact in an office space rental sales dimensional model. Similarly, we cannot automatically assign Book Cost Amount to the fact table based solely on its property that it can be mathematically manipulated. To know whether to place Book Cost Amount in the fact table or leave it in the dimension, we need to understand both how the business determines book cost as well as how cost is used in queries.
How does the business determine book cost? There is always a tight link between business process and business usage. Therefore, by understanding the process of setting the cost, we can determine whether the cost is solely a property of book or also requires knowledge of other factors such as region. If other factors play a role in the calculation of cost, then cost needs to reside in the fact table. Eric Nielsen, enterprise architect, suggests normalizing to determine how book cost is identified: "If the value of Book Cost Amount is always the same for a given instance of Book regardless of when or where that Book is sold, then Book Cost Amount belongs in the Book dimension table."
How does the business use book cost? If business questions only reference Book Cost Amount to help describe the books along with other book information such as title, then Book Cost Amount can remain in the dimension. If, however, Book Cost Amount needs to be viewed at similar levels as that of Sales Amount, moving Book Cost Amount to the fact table can enable the user to answer more questions such as this one posed by data architect David Ness: "What is the total cost of all books sold in the Northwest region in April?" If we expand usage to include profit as well as sales, Jan M. Kamil, enterprise data architect, says that profit calculations that require cost information can obtain that information from the dimension table. Data modeler/data warehouse developer Tom Faulkner suggests also expanding our discussion on usage to include whether book cost needs to be viewed as it changes along with other book data elements: "The cost of a book or any inventory item can change over time, so a slowly changing dimension is a good place to keep the cost of an item to track the history of an item's cost."
It is also possible that, based upon cost calculation and usage, we store cost in both the fact table and dimension. Tim Huckabay, data architect, says that having cost in both places could allow cost at the time of sale in the fact table and current standard cost in the dimension. "Publishers are generally taking orders for a book before the final bills for printing and promotions have been received, so sales are made before knowing the actual cost per book. In this case, cost changes may be viewed as a correction rather than a change that needs to be tracked in the dimension; users probably want to use the most recent cost value in reporting. It may be occasionally useful to be able to report the cost as it was at the time of sale, leading once again to a design with cost at time of sale in the fact table and current standard cost in the Book dimension."
Asking where Book Cost Amount belongs also raises more fundamental questions on our model in terms of scope and granularity. Senior system engineer Lee LeClair first questions why Book Cost Amount is even in the model at all. "The problem statement includes nothing about book cost or profit on sales." Gordon Everest, professor emeritus, says, "If you want to incorporate cost, then we must go down to individual sales for each day in each region, and add cost, quantity and price to the fact table. From that we can always aggregate up to the level of granularity implied by the given dimensional model."
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access