A GUID is a globally unique identifier for a database. There is much interest lately in using the GUID for a tables surrogate key. For example, Bob the customer could be uniquely identified by the GUID value 6A603133-2280-4D24-83CB-59AFCDF52297.
What are the pros and cons of using a GUID for a surrogate key?
Surrogate keys are useful for integration and efficiency, yet they can come with coding and retrieval complexities. GUIDs are like surrogate keys on steroids. The domain of possible values is immense, leading to additional benefits as well as additional costs. GUIDs have several important benefits:
- Broad scope of uniqueness. The GUID is 16 bytes in length, allowing an incredible 2128 unique values. This is a very large number, and therefore multiple databases within an organization can all share one central GUID generator and maintain unique values across databases.
- Standard and easy automatic process to create. Steve Roper, database administrator (DBA), says GUIDs are very easy to generate on the fly. In fact, there are a number of database vendors as well as Web content managers (e.g., Google) that offer programs to generate GUID values. Joe Tschida, data warehouse architect, says GUIDs can simplify coding: You wouldnt need to rely on an auto-counter or a sequence for the key generation.
- No intelligence at all. Many organizations use counters as surrogate key values (e.g., NextVal). Some argue that counters contain intelligence because their values are generated sequentially as they are needed. Therefore, chronology is built into the key. For example, the month January might be assigned 1, February assigned 2, and there is a good chance that March would be 3. The latest versions of GUID generators, on the other hand, produce almost completely random values. (I say almost because the version number of the GUID generator is embedded in each GUID value.)
A number of cons also exist when using GUIDs:
- Size. GUIDs are big - really big. Steve Turnock, database engineer, says on average GUIDs use four times more storage and display space. Chris Strasser, CIO/CTO, says, For many types of applications, 128 bits devoted to a key is overkill. For example, a cell phone provider with 20 million customers could get by with only 32 bit keys. On this many records, the 96 additional bits would amount to nearly one-quarter GB of wasted space. This problem gets worse as foreign keys are added.
- Cross-environment concerns. Kevin Heinsey, data management lead, neatly summarizes this concern: GUIDs require care when replicating a system across like environments. Unlike sequential keys, the same value will not be created via the same transaction in multiple physical databases such as production and test environments. So, a single source of the key must be established, and then it must be pushed to other locations. And contrary to some arguments, there are reasons that the key value matters. Back to the production and test example; if you want to refresh your test database with recent customer order transactions for bug testing, your lookup table for order source better be the same in both environments. Mohammad Meimandi, senior data analyst, warns us to be careful when recovering from a broken server because the GUIDs produce unique values across servers.
- Performance. Declan Moss says, Something needs to create the GUID. This could slow down your extract, transform and load [ETL], depending on when the call is made and where the GUID generator is located.
- Not guaranteed to be unique. Because GUIDs are generated randomly, there is a chance (a very remote chance) duplicate GUID values can be produced. Therefore, some organizations take a performance hit by building collision detection in the application to prevent inserting two GUIDs with the same value.
- User-unfriendly. GUID values can be difficult for IT developers and support folks to work with for coding and debugging. And in the unlikely event that business folks actually view GUIDs, cover your ears to minimize the sounds of their screams.
Wally Zaremba, solution architect, reminds us that with the GUID we must still build and maintain references to the real business natural. I am a firm believer in this, as every surrogate key must also have a natural key. Mehmet Orun, senior manager, Data Services, says with centralized processes and standards, much of the complexity of GUIDs can remain behind the scenes. Orun explains, While there is definite complexity to GUID use, for system integration, it can ultimately provide the simplicity iPhones single button brings to the customer - which is inherently complex in its own software and hardware, but end users wouldnt know it.
If you would like to become a design challenger and have the opportunity to submit modeling solutions, please add your email address at http://www.stevehoberman.com/. There is also an overview on how to read a data model at Steves Web site.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access