Q:

In a data warehouse for the financial or telecom industry, the customers are of two main categories: corporate and retail. Some of the customer dimension attributes for these two categories would be common. Would it be better to have two different dimension tables for corporate and retail customers or have a single dimension with no values where the fields are not applicable to the category of the customer? If we have two dimension tables (one for corporate and one for retail) and have two key fields in the fact table, when we populate a record of a corporate customer, the key field for a retail customer will have no value to be populated. How do you handle such situations?

A:

Les Barbusinski’s Answer: This is a common dilemma in star schemas. Another fairly typical example of this would be company-owned stores vs. franchise stores. In my experience, using a single dimension is usually best. That way a generic search will identify all customers that meet the selection criteria regardless of type. You can then do a more focused analysis on either type (retail or corporate), if you encounter a particularly interesting metric or pattern. Chuck Kelley’s Answer: I have done this both ways that you have given as solutions (two separate dimensions or blank attributes) and a third way (create what Ralph Kimball calls a mini- dimension). I believe it depends how many attributes are “applicable,” how well your users understand data and what type of product your end users are using. Two separate dimensions work well if your query asks for data from either dimension. If you query from both, you will either get no data or data with no values – neither a good solution (this assumes you are using a tool like Business Objects, Oracle Discoverer, etc., that creates SQL and passes it to the data engine). Blank attributes (or I prefer populating them with “zzNot Applicable” – the “zz” makes it sort to the bottom of lookup lists!) are a problem with tools that don’t display null valued rows that the key is being sorted on (like Business Objects, etc.). Mini-dimensions work well if the data is truly separable.

Joe Oates’ Answer: I have developed commercial data warehouse and transaction processing systems that use a common “super class” dimension table to represent Person, Organization, Organization Unit, Household, etc. This is the only approach that works for an enterprise- wide data warehouse as opposed to a data mart. This “super class” table has a generic name such as Party or Involved Party and the columns have generic names as well. Since relational databases cannot do true sub-classing, a view is used to change the names of the columns to names that fit the role such as Retail Customer or Business Customer. In cases where both retail customers and business customers can be involved in the same transaction fact table, create another view to represent both, maybe just called Customer. Of course, the definition of each of these views must contain a good definition of the role and state that it is a view of the “super class” table. Additionally, if there are specific details about the role that are important to the BI analysis, create a separate table containing these details, along with the necessary foreign keys to relate the role to the “super class” table and include this table in the view created for the role.

This approach allows you to have one fact table for call detail records (CDRs) in a telco data warehouse that makes analyses combining the different roles very easy, while at the same time allowing analysis of each of the roles.

Unfortunately, all of the data modeling and design tools that I am familiar with have trouble with this approach, so you have to “cheat” them. This problem is that you cannot draw a relationship line between a table and a view in these tools. A way to get around this is to create a “fake table” for every view with the same name using some naming convention so that when you look at the diagram with the real and fake tables, you know which are real tables and which are views. Make sure the “generate table” option for the “fake tables” is set to false.

For further insight into this design approach, you should read Len Silverston’s excellent recent article in DM Review entitled “A Universal Data Model for Relationship Development.”

Clay Rehm’s Answer: I would recommend a Customer table that holds common attributes across any type of Customer with snowflaked Retail Customer and Corporate Customer tables. The “Customer” dimension shares attributes across all types of customers and would contain a “type” column in the table that signifies if it is a corporate, retail or any other type of customer you may have in the future.

This way, there is always and only one primary Customer key, and you never have to worry if it is invalid or null. The Customer table would contain the Customer Key, the Customer Type and any other common customer attributes. The Retail Customer table would only contain attributes specific to a retail customer, and the Corporate Customer table would only contain attributes to a specific corporate customer. If another customer type came along in the future, it could easily be snowflaked off of the Customer table.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access