There are some "lively" discussions going on regarding the appropriate use of surrogate keys versus natural keys? What is your opinion? When is each of these most appropriate? What are the advantages and disadvantages of each?
Les Barbusinskis Answer: Surrogate keys (a.k.a. token IDs) are most often used when an entity has no natural key or when the natural key consists of five or more data attributes. The primary advantage of surrogate keys is that they facilitate n-way joins between fact and dimension tables. However, they can be problematic in that they can mask business rules and relationships, and make enforcing UNIQUE constraints difficult. Generally speaking, using a surrogate key is not recommended when a simple, natural key is available for an entity. Doug Hackneys Answer: If your users are being asked to build ad hoc queries using the product IDs, customer IDs, etc. that they know, then you should stick with natural keys. Otherwise, it is not an issue for users and limited to a sometimes obscure and arcane debate among DBAs and data architects.
Scott Howards Answer: Lively discussion may be an understatement. I've seen this debate blossom into a philosophical war. Some claim that surrogate keys must always be used while other claim they never be used. I advocate their cautious use when appropriate.
First let's discuss why one may need them. Deriving a data warehouse occurrence's uniqueness solely relying on the natural keys can often lead to primary keys made up from the composite of every dimensional attribute. This is further exacerbated in the multidimensional data mart in deriving the primary key for a fact table. In its simplest form, a fact table's primary key is the composite of all dimensional keys. A star schema with time, geography, product, customer and business scenario dimensions would lead to a fact tables primary key being a concatenation of five attributes. If you have slow varying dimensions where time must be added to the dimension key, you wind up with a concatenation of 10 attributes. Take this to the physical implementation layer where some RDBMSs have problems joining tables based on concatenated keys, that is the number of concatenated attributes prevent appropriate index use, and you have a performance horror (we've seen this behavior with Oracle). These RDBMSs will perform admirably if you simplify the composite key with the assignment of a surrogate. Another reason for surrogate key assignment is the possibility of key reuse in the operational system.
Surrogate key assignment 101: I generally recommend surrogate keys for all slow varying dimensions. Also consider them when the dimension will be combined with many other dimensions in support of a single star schema many dimension converging on a single fact table. Don't, however, assign surrogate keys for assignment sake. Surrogate key assignment and then resolution back to the natural key when needed is expensive in both storage costs and ETL processing. In other words, justify all surrogate key assignments by validating they save a problem that I just described from hindering your DW.
Michael Jennings Answer: As always the answer is, it depends. Surrogate key use offers several advantages from a dimensional modeling perspective. It provides a layer of insulation in environments where frequent changes to source systems or natural keys can occur. Since the dimensional model uses surrogate keys, only the mapping (natural keys=> surrogate key) is affected not the key structure of your database. A single fixed length surrogate key on a table offers database performance advantages in cases where numerous natural keys are needed to establish uniqueness in a dimension and/or where the natural key has a large or highly variable length (e.g., URs).
Why would you ever not want to use surrogate keys in a dimensional model?
Chuck Kelleys Answer: I always (well, 99.999999999999999 percent of the time) use surrogate keys because the allow the optimum flexibility without the loss of anything. I see no disadvantages to using them.
Joe Oates' Answer: I can only give some examples of what made me supporter of surrogate keys.
Clay Rehms Answer: I prefer surrogate keys. However, whichever route you go, be consistent. The primary benefit of the surrogate key is that if your source system changes, your keys dont have to change.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access