A surrogate key is a unique identifier for a table, often a simple counter created by developers or database administrators. In my classes, I strongly recommend tagging what the “real” primary key is in the table – in other words, the one that a business professional would consider the unique identifier. We tag by using the concept of an alternate key.
For example, if Employee Identifier is the surrogate key of the Employee table, then an alternate key could be Employee Social Security Number. Order Line may have Order Line Identifier as its surrogate key, and both Product Code and Order Number as its alternate key.
There are times however, when a surrogate key may not have a corresponding alternate key, such as when there is no real primary key. That is, there is no way of identifying records in this table other than through the use of a surrogate key. For example, I worked on a data warehouse project where we had no way of identifying a single record for a certain type of transaction, and therefore we created a surrogate key on this table without having an alternate key.
I asked the Design Challengers: Are there other situations where we may not have or need an alternate key when using a surrogate key?
There are three main situations where it can make sense to create a surrogate key without also creating an alternate key: integrating, auditing and when it is not required to see just a single record.
As an aside, the term “surrogate” implies “substitute” for something, and if there is nothing there to substitute for, then we cannot use the term surrogate. I often use the term “virtual” in these situations, yet for the sake of consistent terminology within this article we will continue to use the word surrogate.
Integration Stepping Stone
When multiple sources are brought together during a large integration project such as for a data warehouse, data is often loaded into tables even though there will be duplicates or poor data quality. In such situations, having a surrogate key allows for all of the data to be brought into a single table so that we can start the data cleanup process. Several practitioners provide more insight into such situations.
Colin Davies, data wrangler, says, “I just dealt with exactly this problem. We have a situation that is similar to a data warehouse, in that we have multiple data sources. However, the same transaction can be sent to us from more than one source -don't ask! -, and we have to identify duplicates. We opted to generate a surrogate key. It actually works quite well.”
DW/BI Solution Architect Ralitza Iankova says, “In mergers and acquisitions, there may be colliding natural account or product keys, for example.”
Data Architect Mark de Kiewit shares, “We had a requirement for the business to maintain target margins (at which account managers should sell a product to customers), which they would then upload into the system via spreadsheet. To make the maintenance of these target margins more flexible and easier for the business, the format of the spreadsheet is such that the business can populate it at one of various levels of the product hierarchy. So we would have to look at each record to determine which fields were populated and as such have implemented a surrogate key to identify records.”
Pon Prabakar, data architect, says, “A surrogate key will be helpful in a lookup table which we need, but really don’t know the source of the data or what it will contain. Recently, I required a business segment for which I don't know which source is the right one and what value will exist. So I created 100 surrogate key values in the table with null code and description. We used this table to populate our main entity but started updating the minor entity (lookup table) when we received the golden list.”
Data Modeler Sam Moeller explains, “We've had instances where the key consists of four or five attributes, but there will be times the fifth attribute is not applicable -- asset classes, for example – in which some have three and others have four attributes to define an asset class. To avoid nulls in the key, we create a surrogate key.”
Audit and Log
Frequently, the project team needs to record that certain events took place in the processing of data, such as when records are loaded or modified. Often this auditing log is stored in the database with a surrogate key added so that all of the records can exist within the same table. Practitioners tell us more:
Data Architect Forrest Carter states, “For instances where you merely want to continue to add events, such as for a log file, you would probably just want the surrogate key only, not caring about uniqueness of the non-surrogate key columns.”
Tim Bowders, data modeler, says, “Data is logged in arrival sequence. The only unique information on each row may be a timestamp or sequential counter. In this case, the timestamp or sequential counter (a surrogate key) is the only key that is available.”
Data Modeler Ed Cavaness says, “Tables like journaling entries, or rows of a purely audit table entry, have no alternate or ‘real key’.”
No Requirement to See a Single Record
Sometimes even though it might be possible to define an alternate key, there may not be the requirement to do so. More here:
Georgia Prothero, data modeler, says, “The need for an alternate key may be challenged in situations where there is no requirement to retrieve a single row using a logical value or set of values. For example, if you have a table where there are only inserts and bulk updates/deletes, or if you have a parent/child setup where for each parent record you always want to retrieve the entire set of children associated with it."
Data Architect Rich Kier identifies the instance of comments or notes. He explains, “Many times notes are tied to a ’thing‘ such as a loan, account or customer, and you can set an alternate key on the foreign key to that and the timestamp. People aren't going to retrieve a record by timestamp, although they'll likely sort them by that data. People look for notes based on the thing they're tied to (all notes on a customer) content (keywords), and any application categorization or tags.”
John Giles, IT consultant, says, “In the real world, we encounter what I might call ‘naturally occurring’ keys i.e., we don't have to create in the IT system. Some are unique e.g., car registration numbers (at least unique within a state). Others are not unique but still useful e.g., my name. We use these ‘keys’ in our day-to-day speech. I'm at a conference, and someone announces that the owner of car ABC-123 has left their lights on. Blast - that's my car. But we also identify things by association. Someone tells me my car's front left tire is flat. My car has a natural key; my tire does not, but by association I manage quite well to identify the tire. Similarly in IT, some things can be identified by association/relationship more easily than by a key.”
(Author Note: If you’re interested in chiming in on the discussion by becoming a Design Challenger, sign up at stevehoberman.com)
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access