Continue in 2 seconds

In a purely dimensional modeling effort we have a need to obtain data from multiple fact tables to complete the report as the clients need it.

By
  • Larissa Moss, Les Barbusinski
Published
  • June 10 2004, 1:00am EDT
More in

Q:  

In a purely dimensional modeling effort we have a need to obtain data from multiple fact tables to complete the report as the clients need it. One would generally say well just join on common dimensions. The problems that I see with this approach is:

  1. Our fact tables are in the 100s of millions of rows, in some instances, which could prove to be a very intensive join process. Furthermore in order to get the information we need we would be creating joins and selfjoins to fact tables with a minimum of 6 fact tables in the query.
  2. The join will not be simply on a common dimension surrogate key it will include as well non-equal join criteria. i.e.,A.DIM_KEY = B.DIM_KEY AND A.DATE_KEY <= B.DATE_KEY AND B.TYPE_KEY IN ('A','B')
  3. The granularity of the facts is not one-to-one providing for potentially inaccurate Aggregation unless subqueries are used.

I see potentially two approaches to solve this issue.

  1. Denormalize everything out and create a new fact table that has the combined fact tables' information even though this duplicates some data existing in other fact tables.
  2. Compromise and create bridge tables between the fact tables that would be populated by ETL and do the necessary logic such that a purely equal-join could be performed or at least ease the pain somewhat of joining the fact tables. However this means another large bridge table gets added to the mix and I am not sure if there would be a DBMS cost savings after doing that.

What is your opinion?

A:  

Larissa Moss' Answer: This is a classic example of why a data warehouse environment is not - and should not be - all about multidimensional designs only! I rest my case.

Les Barbusinski's Answer: Option 1 is the way to go. It's common - and, often, desirable - to have similar fact tables in a data mart which a) have different perspectives, and/or b) are designed to answer slightly different questions in an efficient manner. Dynamically joining six fact tables to calculate a metric serves no purpose if the report takes forever to generate and/or degrades the overall performance of the mart.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access