Question: My company wishes to study changes to its customer file. We usually have between four and six million customers, and several attributes of interest can change often. A customer may be ranked a "10" one day and only a "9" the next. Management would like to ask questions such as, "Which customers went from a 10 rank to 9 rank last week?" I have been considering a factless fact table to track these re-rankings, rather than a Type 2 SCD solution, because of the size of the dimension. Is this what you would recommend? In the fact table, should I include a old_rank and a new_rank? How can I best build a structure to support these kinds of queries? Thank you.


Doug Hackney’s Answer: I prefer a dimension history table that contains the members that change, the change date and other relevant elements. In your case, prior ranking would be a good column to carry.

Scott Howard’s Answer: If the scope of analysis is as you state – that is the analyst only considers the current and prior values – then a practical transient data approach in which you only model for and store the current and last prior values is very acceptable. However, if a true time-series type of analysis is required in which users may want to analyze a customers ranking history over indiscriminate time periods, you had better fully model the complete history.

I do advocate a combined approach in which you model the most common requirements in a transient manner for good performance for the majority of your queries and fall back to querying the detailed models when more exhaustive analysis is required.

Chuck Kelley’s Answer: I think that since you are calculating a ranking on a daily basis, based on some proprietary algorithm, creating a factless fact table may be the right way to go.  However, I might be inclined to have a time_dimension, customer_dimension, rank_dimension, and fact table instead of an old_rank/new_rank.  That might be a better way that using a Type 2 slowly changing dimension (SCD) (Note to the Readers: Ralph Kimball in his book the Data Warehouse Toolkit defines 3 types of slowly changing dimensions named Type 1, Type 2 and Type 3).

Joe Oates’ Answer: You are correct in deciding to implement a fact table.  I would think that you should include “old_rank” and “new_rank” because without this information, the “Which customers changed from a 10 rank to a 9 rank cannot be answered easily.  However, if you include “old_rank”  and “new_rank,” these are actual facts; so you will have a real fact table instead of a factless fact table.  You can still include a “ranking_changed” indicator that you can index for fast queries.   You should also determine if there are other changes that the company wants to track.  If so, they should also be included in the new fact table.

Clay Rehm’s Answer: I would build a fact table that contains a row for each day (either with a date dimension key or a date column) and with a rank for each attribute of interest.  The customer dimension would also carry a column for the current ranking for each attribute of interest.  The fact table would provide the history needed, and the customer dimension could provide the most up to date ranking of any attribute of interest.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access