Question: What is a junk dimension and could you provide an example?


Chuck Kelley's Answer: Here is a pointer to an article by Ralph Kimball/Margy Ross concerning junk dimensions: http://www.kimballgroup.com/html/designtipsPDF/DesignTips2003/KimballDT48DeClutter.pdf.

Les Barbusinski's Answer: A "junk" dimension is a collection of random transactional codes, flags and/or text attributes that are unrelated to any particular dimension. The junk dimension is simply a structure that provides a convenient place to store the junk attributes. A good example would be a trade fact in a company that brokers equity trades.

The fact would contain several metrics (principal amount, net amount, price per share, commission, margin amount, etc.) and would be related to several dimensions such as account, date, rep, office, exchange, etc. This fact would also contain several codes and flags that were related to the transaction rather than any of the dimensions ... such as origin code (that indicates whether the trade was initiated with a phone call or via the Web), a reinvest flag (that indicates whether or not this trade as was the result of the reinvestment of a dividend payout) and a comment field for storing special instructions from the customer. These three attributes would normally be removed from the fact table and stored in a junk dimension ... perhaps called the trade dimension. In this way, the number of indexes on the fact table would be reduced, and performance (not to mention ease of use) would be enhanced. Hope this helps.

Joe Oates' Answer: First of all, let's review what kinds of dimensions you may have and then we'll discuss whether or not to have a junk dimension. There are two kinds of dimensions: non-hierarchical and hierarchical with hierarchical being by far the most numerous.

The first kind of dimension relates to customers, vendors and other legal entities (people and organizations, organizational units and households). These dimensions contain data that identifies the legal entity and typically contains other data that is useful for segmentation.

Hierarchical dimensions usually encapsulate a series of one-to-many relationships in a single dimensional table. Therefore they are often referred to as "multilevel hierarchies." These one-to-many relationships are usually normalized tables in a source system. For example, a Geography Dimension usually has Country as the first column. This is the highest level in the hierarchy. A country can contain of many states or provinces so the second column is StateProvince. A state or province can contain many cities, so the third column is City. A city can contain many postal codes so the fourth column is PostalCode. Other hierarchical dimensions include Product and SalesReportingHierarchy.

A subset of the hierarchical dimension is the "single level hierarchy." A single level hierarchy typically contains a primary key, a code and a description of the code. Most single level hierarchies are the equivalent of yes/no flags. Examples of single level hierarchies include TaxableTransaction, PromotableEmployee, etc.

The only problem with the single level hierarchies is that you may have a lot of them in any given dimensional model. Ideally, in my opinion, the concatenated primary key of a fact table should consist of fewer than 10 foreign keys. Sometimes, if all of the yes/no flags are represented as single level hierarchy dimensions, you may end up with 30 or more. Clearly, this is an overly complex design.

A technique that allows reduction of the number of foreign keys in a fact table is the creation of "junk" dimensions. These are just "made up" dimensions where you can put several of these single level hierarchies. This cuts down the number of foreign keys in the fact table dramatically.

As to the number of flags before creating a junk dimension, if there are more than 15 dimensions, where five or more are single level hierarchies, I start seriously thinking about combining them into one or more junk dimensions. I don't like to indiscriminately combine 20 or 30 or 80 single level hierarchies. I prefer to organize sets of single level hierarchies into "theme-based" junk dimension tables based on some common way to classify them.