Q:

Industry standards state that the typical star schema will have about six dimension tables and the associated fact tables depending on grain, etc. Standards also say the dimensions will have a surrogate key which becomes one of the set of keys in the fact table(s). However, is there any reason that these keys cannot be foreign keys through an indirect relationship, and thereby produce a fact table with no actual key? RI between the single dimension table and the fact table is maintained, and with a good ETL tool RI with all dimensional tables should never really be an issue. In addition, the overhead of a unique index for the fact table is eliminated.

A:

Mike Jennings’ Answer: If your ETL processes take on the responsibility of maintaining referential integrity (RI) between the fact and dimensions tables then a unique index is not required on the fact table (or dimension tables). This can provide a substantial increase in load time performance per table. Indexes for reporting purposes can be created as a post processing activity. Look at the article, "Strategies for Custom Data Warehouse ETL Processing" (http://www.dmreview.com/master.cfm? NavID=198&EdID=3573), for some ideas on how to check RI during ETL processing.

Chuck Kelley’s Answer: I don’t know what standards you are speaking of. If there are standards, I have not seen them. As for surrogate keys, I use them primarily for speed of retrieval and flexibility of the data. The definition of the fact table then becomes the intersection of the dimensions. If I read what you are saying correctly, you want to put a table between the dimensions and the fact. The dimensions would point to the middle table and the middle table would point to the fact. That would add another join in my retrieval process with no benefit that I can see (wouldn’t be less storage, because the middle table could not have duplicates; wouldn’t be faster queries due to the extra join; you would still need the same unique index on the middle table; need more CPU and storage). Sorry, but I must be missing something.

Clay Rehm’s Answer: The design you choose must stand the test of time. If your source systems change and your keys are dependent on those source systems, you will have a problem maintaining the foreign keys.

Joe Oates’ Answer: I, for one, am not aware of any industry standards on the number of dimensions in fact tables. In projects that I have worked on, most fact tables had more than six foreign keys. But that is not what is important about your question.

One of the main reasons that dimension foreign keys are inserted into the original transaction record to produce a data warehouse fact table that has more information than the original transaction (called "enriching the transaction"), is to provide acceptable performance. Most of my experience has been with very large databases (VLDBs). One of the principles to get acceptable performance in a VLDB environment is to avoid any join possible. The scenario that is suggested by the question would increase the number of joins required to give an answer.

I will relate an experience with just such a situation. A company had invested millions of dollars in a data warehouse that used designed principles like that suggested in the question. There were several fact tables with more than four billion rows. There were 40 million customers, many with more than one address.

The company was not happy with the data warehouse because even simple queries ran for hours. Several reports took days to run. When our team was brought in to redesign the data warehouse, we came up with a multidimensional design that imbedded the dimension foreign keys directly into the fact tables. We also replaced natural keys with a single surrogate key column. The result was that the reports that had formerly taken days to run, ran in just a few minutes. I knew that performance would be much better than the previous design, but I was really surprised at how much better the performance actually turned out to be.

So in a nutshell, unless you have only a trivial amount of data, taking the dimension foreign keys out of the fact table and putting them into an associative table between the fact table and the dimension tables is not a good idea.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access