FEB 1, 2000 1:00am ET

Related Links

What’s Hot and Not: Trends in Data Warehousing
October 20, 2009
Netezza Opens Architecture
August 4, 2009
Teradata Launches Data Warehouse Packages
November 21, 2008

Web Seminars

6 Key Things to Fast Track your Mobility Strategy
February 23, 2012
Why Getting Started in MDM Doesn't Have to Be Difficult
February 29, 2012
Dashboards: How's Business? Ask your Data!
March 15, 2012

Primary Key Reengineering Projects

Print
Reprints
Email

Many enterprises are undertaking primary key reengineering projects for their core business tables. These tables keep information on an enterprise's suppliers, partners, customers, orders, locations, employees, and so forth. The purpose of these projects is to replace the "intelligent keys" of these tables with "unintelligent" keys.

The cost of these reengineering projects is often very high, and for one "key" reason: when the primary keys of a table are changed, all their foreign key instances must be changed, too. And the foreign keys of an enterprise's core tables are everywhere! They occur in other tables in the same database and also in tables in other databases such as the enterprise data warehouse, downstream data marts, operational data stores and online or batch transaction processing systems. They may even occur in the databases of other companies with which the enterprise does business.

Referential integrity is usually enforced only within the physical database that contains the core table itself. But the instances in tables in other databases are also foreign keys, and they must be changed, too. Moreover, foreign keys exist not just in these online databases, but also in off-line electronic archives ­ not to mention microfiche and hard copy where, of course, they cannot be changed.

Consider the effort to propagate changes in just one table's primary keys to their foreign keys in that table's own database and to their foreign keys in physically distinct databases. Ideally, a change to a primary key value should be done as one atomic transaction; but this usually isn't possible when large, physically distinct databases are involved. So there is a period of time ­ often weeks and months, if many databases are involved ­ over which the changes take place. During this time, joins will fail for the foreign key instances which haven't yet been changed. Consequently, the databases will report incorrect information when queried.

If we want to avoid reporting incorrect information on important business objects, we must use some kind of cross-reference table until all foreign key instances have been changed to the new values. Cross-reference tables track which foreign keys have been changed to the new values and which ones have not. They are useful only if all queries that contain joins based on the foreign keys are altered to add an intermediate join to the cross-reference table; and this can be a very expensive process. What's more, after all of the foreign key instances have been changed, the cross-reference table must be discarded. Consequently, the queries have to be changed back to make the direct join and not use the cross- reference intermediary. Again and again, with one database after another, this lengthy and expensive process must take place until all the foreign key instances of all the primary keys of that core table have been changed.

Thus, reengineering the primary keys of just one core table, for a large enterprise, can take a year or more and millions of dollars. I'll call the factor that accounts for this tremendous time and cost "the foreign key ripple effect." On the bottom line, it outweighs all other costs of reengineering primary keys by perhaps an order of magnitude.

With this kind of cost for these reengineering efforts, we'd better do it right the first time! However, every primary key reengineering project that I have seen or heard of is doing it the wrong way, by reengineering to unintelligent keys only. These reengineering projects will only solve part of the problem because they are based on an incomplete understanding of the semantic requirements for unique identifiers in databases. If these projects, whose objective is to reengineer from intelligent keys to unintelligent keys, are allowed to continue, they will have to be followed by a second round of equally costly projects. The second time, the objective will be to reengineer from unintelligent keys to what I call "enterprise keys."

The good news is that we can reengineer from intelligent keys directly to enterprise keys. We don't have to first reengineer to unintelligent keys and then reengineer a second time to enterprise keys. In this series, I will show that the problem is indeed as serious as I have said and that it is indeed avoidable. I will also show how to avoid it.

Entity and Object Integrity

Basically, the problem is that both intelligent and unintelligent keys satisfy the entity integrity constraint, but neither satisfy the constraint of "object integrity." Entity integrity is the constraint that the primary key values in a relational table must be unique across all of its rows. Object integrity, as this article will explain, subsumes entity integrity under a stronger requirement. It requires that every primary key instance must be different from the other primary key instances of its own table and also from all primary key instances of an entire set of other tables.

That entire set of tables, across which a primary key value must be unique, forms a "namespace." For our purposes, a namespace is simply a set of tables across which the identifier for every row must be unique. This namespace might be a single physical database; but the larger the namespace, the greater the benefits of object integrity. So an enterprise would do well to include all of its principal databases in its initial namespace. Later, a namespace shared with the enterprise's principal suppliers, partners and customers could be created at little incremental cost.

Today's RDBMSs do not enforce object integrity. Consequently, a database can contain two or more rows (each in a different table) which happen to have the same value for their primary keys. It happens this way. Sometimes, the primary keys of two or more tables in a database have the same "syntax" (i.e., the same data type and length) and also identical or overlapping domains (permissible values). When we have these two conditions, the primary key of a row from any of these tables may have the same value as a primary key from a row of any of the other tables. For example, if a purchase order table and an invoice table both have primary keys consisting of a date plus a sequence number, then every day there will be n pairs of orders and invoices with identical primary keys, where n is the smaller of the number of orders or of the number of invoices produced that day.

Filed under:

Advertisement

Comments (0)

Be the first to comment on this post using the section below.

Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.
Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.