Free Site RegistrationFree Site Registration

Sign up today and access Information Management on the web!
Your FREE registration entitles you to:

FREE email newsletters

FREE access to all Information Management content

FREE access to web seminars, resource portals, our white paper library and more!

Time and Time Again: Managing Time in Relational Databases, Part 12: Foreign Keys in Version Tables

InfoManagement Direct, October 2007

Tom Johnston, Randall Weis

Advertisement

In this article, and in Part 10 and Part 11, we have been discussing the issue of foreign keys and referential integrity in versioned tables. The issue arises because most modelers and database administrators (DBAs) seem to believe that if a change is made in a versioned table, and there are foreign keys pointing to that table, those foreign keys must also be changed. If they themselves are in a versioned table, and foreign keys in yet other versioned tables point to the rows they are in, versioned updates will ripple out to those other tables, creating a cascade update problem.

In Part 10, we decided that our options are the options we always have for any problem. We can ignore the problem, avoid the problem or fix the problem. We also determined that we cannot ignore the problem because we are looking for an enterprise solution, one applicable to all databases. For databases containing small and/or infrequently updated versioned tables, we might be able to ignore the problem. But an enterprise solution must apply to all databases, regardless of size or frequency of update.

 

In Part 11, we suggested that the most perspicuous line of investigation was to begin by asking "What do foreign keys to versioned tables relate?" One answer is that they relate rows to rows. The "related-to" rows are the rows in versioned tables. The "related-from" rows may be either versions or non-versions. On the assumption that foreign keys relate rows to rows, whether or not any of those rows are in versioned tables, we developed one solution to the versioned foreign key problem. Here in Part 12, we develop a second solution to the versioned foreign key problem, also based on the assumption that those foreign keys still relate rows to rows. We will then develop a solution based on the assumption that foreign keys that point into versioned tables do not relate rows to rows, i.e., versions to versions, but rather relate objects to objects.

 

We will conclude our discussion of referential integrity involving versioned tables by distinguishing between the machinery of versioning and its semantics. The machinery is complex, and should be as invisible as possible to developers and users. But the semantics are the semantics of temporal databases, and once such databases are supported, those who write queries must understand those additional semantics in order to write unambiguous queries against those databases.

 

Chart of Version Patterns

 

Before continuing, we insert here the current chart of version patterns. References back to earlier patterns are happening pretty frequently now, and we hope this will help the reader recall what those prior version patterns are. They also give a hint of version patterns not yet discussed.

 

Figure 1: Chart of Version Patterns

 

Chart of This Series

 

Before continuing, we insert here a chart listing the installments to date in this series, and a brief description of what each one is about. As this series grows longer, the back references to earlier installments will only increase, and we think that having a chart like this at hand will be helpful.

 

Figure 2: Chart of Installments to Date in this Series

 

Solution 2. Don't Update Foreign Keys to Versions

 

As explained in Part 10, this is the second solution to the cascade problem, given the assumption that foreign keys to versioned tables relate versions to versions, if the foreign-key containing table is also a versioned table, or else relate objects to versions, if the foreign-key containing table is not a versioned table. The first solution is to update foreign keys in versioned tables, but to do so as updates in place, updates which overwrite the foreign key but which do not produce new versions in the process. The second way, discussed now, is to not update foreign keys that point to versions.

 

Figure 3 shows the example we introduced in Parts 10 and 11. Both the Client and Policy tables are versioned tables.

 

Figure 3: The Client Version and Policy Version Tables

 

Let's assume that a new version of client C882 has just been created. This is shown in Figure 4. With this “don’t update” solution, no further physical activity against the database is required. In particular, we do not update the foreign key that points to [C882, 5/23/04].

 

Figure 4: Client C882 is Versioned

 

Now let's insert a new policy and see what happens. The result of this insert is shown in Figure 5.

 

Figure 5: Client C882 Gets Another Policy

 

At first sight, Figure 5 is puzzling. Policy P138 points to the first version of client C882, but policy P704 points to the second version. Why is there this difference?

 

The difference is in the appearance of things only. It is, in fact, the result applying the following rule.

 

Figure 6: Rule for Managing Foreign Keys in Versioned Tables

 

The version of the client that was current when policy P138 was created is the first version. The version current when policy P704 was created is the second version. So with this rule, every foreign key pointing to a versioned table points to the version of the related object that was current when the foreign-key containing row was created. This rule applies both when the foreign key is a column in a versioned table, and also when it is a column in a nonversioned table.

 

Joins using such versioned foreign keys show what things look like, across joins, at the moment in time that the row from which we are joining was created. Therefore, using this rule, we can always follow a chain of foreign keys out from a particular row of interest, and bring together versions of the related objects as they were when that original row was created. We can, in other words, bring back data which is equivalent to a snapshot of that data taken at the moment the original row was created.

 

But sometimes that isn’t what we want. Sometimes, we want to join to the most current version of the related object. For example, consider a query which takes policy version [P138, 6/1/04], and joins it to the client who owns that policy. If we use the foreign key in that version, it will join to client version [C882, 5/23/04].

 

If this query was executed anytime between 5/23/04 and 9/30/04, it returned an upsell policy type of PPO. And during that period of time, this is the correct upsell policy type. But suppose this same query was executed on or after 10/1/04. In that case, the query returned an incorrect result, because from 10/1/04 forward, the current upsell policy type was HMO.

 

What are we to make of this? First of all, it is clear that this way of managing foreign keys in versioned tables will always produce correct results provided that, implicit in every join is the assumption that joined-to versions will be the version current when the joined-from row was created. But if this assumption is not valid, and in particular when what is desired is joins that return the then current versions of all joined-to objects, then using the foreign keys in versioned tables to make those joins is not guaranteed to return correct results.

 

How, then, can we write these other queries, the ones that want the current version of all joined-to versioned objects? The way we can do this is to write a query which specifies the joined-to object identifier – client number in this case – together with a clause that specifies the greatest version begin date not greater than the current date, and containing a {null} object end date. The “not greater than the current date” qualifier is needed to filter out future versions, versions that are not yet in effect at the time the query is executed. The "containing a {null} object end date" qualifier is needed to filter out an otherwise qualifying version that is for an object that has been deleted.

 

Something else that we must be aware of is that even though foreign keys to versioned objects point to rows current at the time the containing row is created, this does not guarantee that using these foreign keys will always return versions current when the joined-from row was first created.

 

Here's how this apparent anomaly can happen. Suppose that we also have an Employer table, which is also versioned. This table contains the employers of our clients. Now suppose we have a query which selects a policy, and then joins first to the client version current when it was created, and from there to the employer version current when that client version was created. This query may return employer data that was not current when that policy version was created. Figure 7 illustrates this situation.

 

Figure 7: “Current When” Joins Do Not Always Retrieve Contemporary Versions

 

This join will return [E721, 11/15/01], even though the version of employer E721 that was current on 6/1/04 was [E721, 5/15/04]. Supposing that the third column is employer data retrieved in the query, that query will show “abc” as the "current" employer value, whereas “xyz” was the value current when the policy version was created. These situations can arise because “current when” is not a transitive relationship. This is illustrated in Figure 8.

 

Figure 8: “Current When” is not a Transitive Relationship

 

Assumption 2. Foreign Keys in Versioned Tables Relate Objects to Objects

 

The first answer to the question, "What do foreign keys in versioned tables relate?” was that they relate rows to rows. On that assumption, we found that there were two solutions to the cascade update problem.

 

The second answer to that question is, “Foreign keys in versioned tables relate objects, not versions.” But we have to understand what this means since, in a physical database sense, it is obvious that foreign keys relate rows to rows. And in the case of our examples, since both the Policy and Client tables are versioned tables, foreign keys relating these two tables must relate rows to rows – in other words, versions to versions.

 

But in a semantic sense, our ability to relate versions to versions, given current RDBMS technology, is limited. If we don’t update foreign keys when the versions they point to are superseded, then the relationships implemented by those foreign keys slowly “age.” On the other hand, if we do update (overwrite) foreign keys when the versions they point to are superseded, then the relationships established when those rows were first created are no longer stored in the database. Our only other option is to create a new version each time a foreign key changes. But that option creates the cascade update problem we are trying to avoid.

 

On assumption 2, we may as well store only the object identifier component of a foreign key to a versioned table. In our example, we may as well store only client number, and not bother with version begin date. If we want to join a policy to the client version current when the policy was created, the WHERE clause will ask for the version of that client whose begin date is the greatest date less then or equal to the begin date of the policy version, and whose object end date is {null}.

 

On the other hand, if we want to join a policy version to the most current version of the related client, the WHERE clause will ask for the version of that client whose begin date is the greatest date less then or equal to [now], and whose object end date is {null}.

 

One would hope, of course, that a versioned client would not be version-deleted (have a non-null object end date) if it still had at least one non-deleted versioned object related to it. If we were dealing with non-versioned objects, the DBMS would guarantee this by either carrying out a cascade delete, or by denying the delete until all dependent objects were first deleted.

 

But with versioned objects, the DBMS cannot guarantee this because such temporal semantics are not part of any current RDBMS. So since we must guarantee it ourselves, there is always a chance of error. To guard against that error, the WHERE clause we are discussing must also specify that the client version whose begin date is the greatest date less then or equal to [now] also has an object end-date of {null}.

 

Hiding the Complex Semantics of Time

 

We have already mentioned that managing time in relational databases is perhaps the most difficult challenge we have ever encountered as analysts, modelers and DBAs. This discussion of the semantics of versioned foreign keys certainly emphasizes this point.

 

It is clear that the mechanisms which manage time in relational databases must be hidden from developers and users alike. One reason is the complexity of those mechanisms. Another reason is that the management of time is not an applications- or subject area-specific requirement any more than the management of dynamic balanced trees is an application- or subject area-specific requirement.

 

The optimal solution will be one provided by vendors and specified by standards committees. The mechanics may differ from one vendor to another, as long as the syntax and semantics of both the SQL DML and SQL DDL are supported. But a SQL with temporal data management capabilities must first be proposed and then accepted by the standards committees. The proposal was made, almost a decade ago, but the acceptance seems no closer today than it was back then.

 

So in the meantime, we must provide our own encapsulation of this complexity, our own enterprise-level solution to temporal database management. That is what we are doing in this series of articles.

 

Being Aware of the Complex Semantics of Time

 

There is one thing about temporal database management that cannot be hidden, however. It is the fact that with such databases, new temporal semantics are available. This means that anyone who writes queries must understand these semantics, and code his queries accordingly. For example, a query asking for a join of a policy version to a client version raises the question, “Which client version?” A query asking for a join of a policy to a client raises no such questions.

 

If the query writer does not understand the possibilities, and therefore does not code for them, then either the DBMS will tell him that his query is ambiguous, or it will execute the query using whatever default behavior has been defined. In the former case, the query writer needs to understand what the error message means, and then write a new query that makes his temporal assumptions explicit. In the latter case, the query writer had better understand what the issues are, and what the default behavior is, or else he may ask one question with his query, but receive an answer to a different question in the result set returned by the DBMS.

 

The basic point is this. The mechanics of temporal database management can and should be as invisible to developers and other query authors as possible. But it is still necessary for any query author to understand temporal semantics, because he cannot unambiguously say what he means unless he does understand those semantics. No matter how elegant and expressive temporal SQL will prove to be, it cannot relieve query authors from the obligation to understand the options involved, and to be careful to code queries which say exactly what they mean.

 

The Versioned Table Foreign Key Position, Going Forward

 

In Part 13, we will take up Version Pattern 6, the bi-temporal versioning of recurring objects. When we do, we will handle foreign keys to versioned tables by capturing the full foreign key of the version current as of the time the row in question is inserted, both object identifier and version begin date. This will be done whether the row being inserted is being inserted into a versioned or a non-versioned table. The issue is the table pointed to, and what to do when that table is versioned. Thus, in terms of our example, we will continue to record both client number (the object identifier) and version begin date as a foreign key in the Policy table.

 

Tom Johnston is an independent consultant specializing in enterprise data architecture, and in relational, object-oriented and data warehouse modeling in various industries, including telecommunications, health care, banking, retailing and transportation. He can be reached at tjohnston@acm.org, and his Web site is www.MindfulData.org.

Randall Weis, founder and CEO of InBase, Inc, has more than 24 years of experience in IT and IT management, specializing in enterprise data architecture. Weis' technical expertise is in sophisticated, multitiered systems. He has designed logical and physical data models and implemented several high profile, very large database (VLDB) systems in the financial and insurance industries. These systems have had very stringent performance and real-time history requirements. His software development company, InBase, Inc., has developed software and Web sites used by some of the nations largest companies. Weis has been a presenter at various user groups, including Guide, Share, Midwest Database Users Group and Camp IT Expo. His technique for modeling history, retro activity and future dating has been reviewed and approved for the physical implementation of IBM's Insurance Application Architecture (IAA). He may be reached via email at randyw@inbaseinc.com.

For more information on related topics, visit the following channels:

Advertisement

Advertisement