Do you suggest surrogate keys in data marts and/or EDW solutions?


Les Barbusinsk’s Answer: Just the opposite: use natural keys wherever it’s feasible to do so. Limit the use of "token IDs" to situations where a) no natural key exists, b) the natural key consists of too many elements, c) the natural key is too long, or d) the data is being sourced from multiple systems and each system has its own identifier for the entity (i.e., you need a common identifier).

Surrogate (or "tokenized") keys have a lot of drawbacks. They typically increase the complexity of ETL processing logic (thereby increasing runtimes), are prone to "double-posting" errors (i.e., they do not enforce "uniqueness"), etc. Their one-and-only benefit is that they facilitate joins. Use them where it makes sense but use them sparingly.

Mike Jennings’ Answer: As always the answer is, it depends. Surrogate key use offers several advantages from a dimensional modeling perspective. It provides a layer of insulation in environments where frequent changes to source systems or natural keys can occur. Since the dimensional model uses surrogate keys, only the mapping (natural keys=> surrogate key) is affected not the key structure of your database. A single fixed length surrogate key on a table offers database performance advantages in cases where numerous natural keys are needed to establish uniqueness in a dimension and/or where the natural key has a large or highly variable length (e.g., URLs ).

Why would you ever not want to use surrogate keys in a dimensional model?

  • Your particular data warehouse environment’s size and/or the number of sources is small and stable.
  • Your window of time for processing updates to your data warehouse is very small or near real time. The mapping of a natural key (s) to the surrogate key in the dimensional model requires additional processing and complexity to the ETL process.
  • Your particular dimensional database is very large and to increase the efficiency of ETL processing and access the RDBMS’s database fragmentation method is used. Database fragmentation strategies are most efficient when segmented using natural keys versus surrogate keys.
  • Your particular reporting tool may not provide sufficient navigation of dimensional hierarchies, filtering, aggregation or other capabilities to produce the required output needs of your users when the dimensional model uses surrogate keys.

Chuck Kelley’s Answer: Absolutely, in (almost) all cases, with very few exceptions. They provide for more flexibility and (generally) better performance when joining with other tables in a relational database.

Joe Oates’ Answer: Yes. I can only give some examples of what made me supporter of surrogate keys.

  • Another operational system was brought into the warehouse and had duplicate keys.
  • Joins are much faster on integer keys than on a long, mixed alpha and numeric, concatenated primary key. I was once forced to change from these kind of "natural" keys in order get query performance that the customer would accept.
  • New records came along with the same natural key as records that had been logically deleted.

Clay Rehm’s Answer: Absolutely!

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access