JAN 17, 2008 11:52am ET

Related Links

The Case for Text Mining in Payment Integrity
July 2, 2014
5 Methods for Visualizing Unstructured Data
January 13, 2014
New Product News - January 10, 2014
January 9, 2014

Web Seminars

Why Data Virtualization Can Save the Data Warehouse
September 17, 2014
Essential Guide to Using Data Virtualization for Big Data Analytics
September 24, 2014

The Artist Formerly Known as Prince

Print
Reprints
Email

Eventually, you see master data management (MDM) problems everywhere. A few years ago, I was turning off the Garden State Parkway in New Jersey and noticed an advertisement for an upcoming concert at a local venue. The performer was identified as “The artist formerly known as Prince.”

Prince, full name Prince Rogers Nelson, is by all accounts a talented and creative rocker. In 1995, he somehow got into a dispute with his record label, Warner Bros., and decided to change his name. He felt that they had controlled his creative output for too long and wanted a completely new identity. To achieve this goal, Prince changed his name to a symbol that he invented and which he deliberately chose not to call anything. This meant that Prince’s new name could never be pronounced or written down as words. Anyone can invent a surrogate key that has no meaning and describes no characteristic of what it is identifying. To create one that cannot be communicated is a mark of true genius.

Obviously, Prince is in a business where marketing and communication are important, and where a huge number of publications have to create content about popular music on a regular basis. The name change disconcerted the trade media, so they came up with the identifier “The artist formerly known as Prince” which soon got shortened to “TAFKAP.” True to form in MDM, Prince ended up being the subject of several identifiers. Eventually, he patched up his differences with Warner Bros. and went back to being Prince, except that some in the media now termed him “The artist formerly known as the artist formerly known as Prince” or “TAFKATAFKAP” for short.

Surrogate keys are basically made-up data elements that identify things in databases. They are not a characteristic of the real-world thing they are identifying, but simply serve to make instances of data about the thing unique.

The opposite of a surrogate key is a “natural key.” This is one or more pieces of data that uniquely identify a real-world instance of a thing, and which are actual properties of that thing. Here we can run into some problems, however. Exactly what characteristics identify a person? Are we talking about fingerprints, DNA profiles or iris patterns? No doubt we are for some specialized databases, but for most business applications, we have an identifier that is in reality a surrogate key, but which is managed externally to our business databases. A good example is Social Security number. We tend to think of these external surrogate keys as the equivalent of natural keys, but we are actually relying on the mechanisms that manage them to work perfectly. As Prince showed, that expectation may sometimes be misplaced.

Irrespective of the problems in trying to define natural and surrogate keys, they generate an awful lot of emotion in the world of data management. It is easier to talk to a group of data modelers about religion and politics than it is to discuss the relative merits of natural and surrogate keys.

Programmers in particular love surrogate keys. If you see a data model where every table has a single attribute primary key with the term “ID” in its name and no identifying relationships in sight, then there is a strong chance that it was made by a programmer. What is it about the programmer mind-set that makes them gravitate to surrogate keys? Having thought about this, I now believe that it is connected with the nearness of programmers to the technical objects they manipulate and their remoteness from the business world. This perspective can make surrogate keys seem vital and natural keys irrelevant, which inverts the real-world perspective of more business-oriented data analysts, leading to the clashes over keys that we have all witnessed.

A programmer has to manipulate data. The data represents real-world things, but not to a programmer. To a programmer, the data itself is a real-world thing. The data sits in databases, where it is organized into tables and columns. But it is also organized by one other concept that is the Cinderella of the data world. It is organized by records.

Pretend for just a moment that we are programmers. We know that programmers have to deal with the data stored in databases. SQL has always carried the promise of “set at a time” processing in terms of records, and this is probably where the demise of the prestige of the record originated. However, a great number of data manipulation problems are not soluble by “set at a time” processing with SQL. Individual records have to be processed, even if they are part of a set extracted from a database via SQL. And here comes a very unpleasant conclusion: a record is a thing of interest to a programmer. Therefore, a programmer, albeit operating in a different plane of abstraction, is in exactly the same position as a business user manipulating business concepts and instances. The programmer feels, and justifiably so, that a record is as vital to his or her work as a customer is to the manager of a call center. A record, then, must have a unique identifier. The surrogate key is not intended to identify the business thing that is abstracted into a database table – it is intended to identify the records that actually live and move and have their being in this table. Nothing outside of this table corresponds to the concept of a record, and so the key must be assigned within the context of the application that manages the table.

Until we get a good handle on this problem, programmers are likely to regard data analysts with the same disfavor as Prince viewed Warner Bros. - that is, as a group out to prevent them from doing what they are supposed to do. Likewise, data analysts are likely to regard programmers as little “Princes” who cannot follow the rules that are there to make things work.

Get access to this article and thousands more...

All Information Management articles are archived after 7 days. REGISTER NOW for unlimited access to all recently archived articles, as well as thousands of searchable stories. Registered Members also gain access to:

  • Full access to information-management.com including all searchable archived content
  • Exclusive E-Newsletters delivering the latest headlines to your inbox
  • Access to White Papers, Web Seminars, and Blog Discussions
  • Discounts to upcoming conferences & events
  • Uninterrupted access to all sponsored content, and MORE!

Already Registered?

Advertisement

Comments (0)

Be the first to comment on this post using the section below.

Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.
Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.