Q:

I am building a data mart for a P&C insurance company. Having reviewed the requirements and discussed them with the users, I built a dimensional model comprised of a dozen dimensions. The fact table ended up to be about twenty-four facts wide! Each row is a "sum of monthly policy activity" for a given policy. Notice: it is not a snapshot of policy' state at the end of the month, but a sum of monthly transactions. Something about it doesn't feel right. First, the width of a row is suspicious. Shouldn't I break the table into smaller subsets of facts, i.e., "written premium," "earned premium," "coverage," etc.? Also, should I assume that users will need a "policy transaction" data mart and just build one, even though it doesn't seem as if they need one, at the moment?

A:

Les Barbusinki’s Answer: It’s difficult to tell without seeing the model itself. Having a lot of metrics in a single fact table – in itself – is not a sin. In fact, it’s fairly commonplace. I do, however, find it unusual that a single fact has a dozen dimensions. Most metrics are analyzed within the context of three to six dimensions (a few more if we throw in some snowflakes). You may want to reconsider your design from that perspective. As for building a "policy transaction" mart – I wouldn’t. It smacks too much of "build it and they will come." Deliver on the original requirement first to establish credibility, then suggest enhancements for a future phase. If your users don’t want it, don’t build it.

Joe Oates’ Answer: It is not unusual to have many facts in such a table. I would call it an aggregate table rather than a fact table because calculations and summarizations must be done in order to populate the non-dimension foreign key columns. Even though your instinct is to break these out into several separate fact/aggregation tables, doing so will cause difficulties for the end users (I assume that some tool will be provided for ad hoc queries). Breaking the single table into many will require the end user to know the data structures – something most authors caution against. Additionally, there will be queries that will be difficult to do such as comparing written premium to earned premium based on type of coverage. So, if the information in the columns in the fact table are really important related items of information needed by the end users, a single fact table is what you need.

Clay Rehm’s Answer: Facts should only be combined in the same fact table if they are of the same type. It is not the width that should be suspicious but what relationship does each fact have with each other, other than being a monthly sum? You would be wise to store the transaction detail in your data warehouse, which would allow you to sum up these transactions in any way possible, and in any time frame possible. What if your users want to start viewing the data weekly or daily?

Scott Howard’s Answer: Don't confuse the logical model with the physical. If the fact occurrences are indeed unique single occurrences, don't be too concerned with width of the entity. Just ensure that the fact table withstands the test of temporal normalization and be confident that you have the correct logical model.

What concerns you is likely the question: can you really implement such a model on your relational system? You may indeed need to split the physical fact table into several related tables for performance or maintenance reasons or to overcome a physical limitation of your relational database management system. The separate tables can always be joined back together into a single entity should the need arise.

Great tip: Continue to work with your DBAs once you complete the logical model and are in your project's implementation or construction phase. I've seen too many good DW models destroyed by well-meaning and very competent DBAs because they normalize the DW model as if it was an OLTP model. You can't blame these DBAs because they are just doing what they are good at. Be sure to stay involved in the physical implementation. It's also best if your production DBAs are schooled in at least the basics of temporal and DW modeling.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access