Time and Time Again: Managing Time in Relational Databases, Part 5: Version Pattern 3
InfoManagement Direct, June 2007
The business requirement for Version Pattern 3 is to be able to show what an object was like at any point in its lifetime. To do this, we must retain the before-update state of an object, not just the after-update state. We must retain a versioned history of changes to the state of an object. This means that in place of the Policy table we have been using thus far, we will need a Policy Version table.
{BR-3}. Support {BR-2}, and also retain a queryable history of changes to objects.
Tables and Keys
Advertisement
The primary key of our original Policy table is policy-nbr. So the primary key of our Policy Version table must be policy-nbr plus something else. The purpose of that something else is to distinguish each version of a policy from all other versions of that same policy.
Pretty much anything will do to distinguish policy versions. In fact, any value guaranteed not to recur for the same policy will do. The first thing that comes to mind is a sequence number. The initial version is 1, and the subsequent n versions are 2, 3, ...... , n.
This isn't a bad idea, just as the logical delete flag, discussed in Version Pattern 2, was not a bad idea. But just as we preferred a delete date to a delete flag, we prefer to use a date as our version-discriminating part of the primary key for policy versions, and thus the combination of policy-nbr and this date to uniquely identify each row in the Policy Version table.
Business Dates and System Dates
Version Pattern 3 does not distinguish business effectivity dates from physical database activity dates. We must have a business effectivity date, of course. But for the purposes of this analysis, we can consider this date as either the business effectivity start date for the version, or that date and also the date the version was physically inserted. In the former case, we simply fail to record the date of physical database activity. In the latter case, the two dates are identical, with complications that we will examine later.
The former case is simpler, but it is also almost never seen in production databases. Physical insertion dates (and physical last update dates, in the case of updates in place) are almost always included. So for Version Pattern 3, we will assume that version begin date is both the date the inserted row became business effective and also the date the row was physically inserted.
Let's now recreate the series of events described in Scenario 1. We begin with the event that created a new policy for Mary Jaspers. This event takes place on 1/12/04.
Scenario 3.1: A Policy Insert in the Policy Version Table
As we have already pointed out, the physical action of inserting the initial version for a policy is also the logical action of inserting a new policy. Here is the result of inserting Mary's policy effective 1/12/04:
.gif)
Figure 1: Result of Inserting Mary Jaspers's Policy Effective 1/12/04
The primary key of the policy version is policy-nbr (the unique identifier for a policy) plus version begin date. All versions for the same policy have the same policy number and are distinguished from one another by their begin date. Because Version Pattern 3 is based on the assumption that objects cannot recur (as described in Part 4), there will be no temporal gaps among versions for the same object. Consequently, we also know the end date for all versions but the current one. It is one day (one clock tick) prior to the begin date for the chronologically next version.
If the above row were inserted on any date later than 1/12/04, it would be a retroactive insert. If it were inserted on any date earlier than 1/12/04, it would be a proactive insert. But as long as rows must be inserted on their business effectivity dates, then neither retroactive nor proactive inserts are possible. In Version Pattern 4, we will distinguish these two kinds of dates and consider retroactive and proactive database activity.
Using this primary key presupposes that we can never have two versions of the same policy that begin on the same date, i.e., on the same tick of whatever clock we are using. For now, we'll stick with this assumption. Later on, we will examine an exception to it.
.gif)
Figure 2: Timeline Diagram for Scenario 3.1
By inserting this initial version, we have created policy P138, effective 1/12/2004. Now let's see how the first update to this policy is represented in the Policy Version table.
Scenario 3.2: A Policy Update in the Policy Version Table
On 3/14/05, Mary changes her policy type from PPO to HMO (Health Maintenance Organization). After that change is recorded in the Policy Version table, the table looks like this:
.gif)
Figure 3: Policy Version Table After Policy Type Change
Because of our no-gap assumption, we know from this data that this policy was a PPO policy from 1/12/04 through 3/13/05 - the latter date being one clock tick prior to the begin date of the next version.
Note: One way of representing date ranges is by a pair of dates (one of them inferred, in this example). Another is to specify a begin date and a duration, i.e. a number of clock ticks. These two ways of representing date ranges are semantically equivalent; each can handle all (and only) the business requirements that the other can. They differ only in details of implementation. Throughout these articles, we have chosen to represent date ranges with a pair of dates.
When a pair of dates is used, we must specify whether neither, either or both are included in the range they specify. Again, the options are semantically equivalent. In these articles, we have chosen to include begin dates in the range, but exclude end dates.
The timeline for policy P138, after this update, is shown below.
.gif)
Figure 4: Timeline Diagram for Scenario 3.2
Note: graphics for timeline diagrams permit us to draw the lines indicating the start of a version only in alignment with the start of a vertical bar. Vertical bars represent months. So our convention will be to draw these version start lines at the start of the month of their ver-beg-dt. Thus, a ver-beg-dt of 3/14/05 is drawn aligned with the start of the vertical bar representing March, 2005.
After this change, we do have an end date for our initial version, although it is inferred rather than expressed as an explicit data value. However, we still do not have an end date for Mary's policy itself, as is graphically illustrated by the ellipsis on the above timeline.
Scenario 3.3: A Policy Deletion in the Policy Version Table
On 6/12/06, Mary's policy is terminated. But in the Policy Version table, as shown in Scenario 3.1, we have no way to distinguish a version representing a policy deletion from one representing a policy insert.
To distinguish versions which logically delete a policy, we could add a delete flag. But a delete date carries more information than a delete flag. So while we do not need a second date to manage policy inserts and updates, we do need one to manage policy deletions.
Notice that the version date in the primary key is called version begin date, not version create date or version insert date. By the same token, we will call this second date object end date, not object delete date. And, of course, we do not call it version end date. It is the end date of the object, i.e., the policy, not of a version of the object.
After we terminate Mary's policy, the Policy Version table looks like this:
.gif)
Figure 5: Policy Version Table After Termination of Policy
We can identify the first row as the initial version for P138 because there is no other row for P138 with an earlier ver-beg-dt. We can identify the second row as the terminal version, i.e. the logical delete point, for policy P138 because it is the only version for P138 whose obj-end-dt is not null.
The lifetime of policy P138 is from 1/12/04 to 6/11/06, inclusive. A query asking for what the policy looked like, at any given point during its lifetime, can now be satisfied.
Notice that these two rows are a physically condensed way of expressing the same semantics that would be expressed by having nearly 900 rows in the table, one for each day in the life of this policy. These two rows are semantically equivalent to a set of nearly 900 consecutive daily snapshots. This illustrates how, from a data storage perspective, versions are much more efficient than snapshots.
Notice also that this is the first instance in which we have physically updated a row in the Policy Version table. Prior to 6/12/06, the row with a 3/14/05 ver-beg-dt has a null obj-end-dt. From 6/12/06 forward, it does not. So we have overwritten that state of the row.
In doing so, have we lost information? In fact, we have not. We can infer from the second row above that its obj-end-dt would have been shown as null prior to 6/12/06, and as not null from that date forward. So although we did physically overwrite that row, we lost no information about the policy, or about what we knew about the policy at any point in its lifetime.
This shows why we think Dr. Snodgrass's metaphor of the relationship between nontemporal and temporal tables of the same "things" is a poor one. He encourages us to think of a temporal table as derived from a nontemporal one by adding various dates to it. So he would encourage us to think of the Policy Version table as the Policy table with time management added to it. Indeed, in his examples, he changes nothing in table names as he changes them from nontemporal to temporal. (This metaphor is first encountered in {Snodgrass 2000, p.20}, where a Lot table is still a Lot table after adding two pairs of dates. It is again encountered on p.113 (Adding History), p.117 "...adding valid-time support to a table..." and elsewhere.)
Throughout our discussion of Version Pattern 3, we have seen the profound difference between objects and versions of objects. In temporal tables, we physically manage rows which represent versions of objects. In the process, we logically/semantically manage those objects themselves. But in that process, the table whose rows represent those objects disappears! We have not kept a Policy table and added a Policy Version table. We have replaced the Policy table with a Policy Version table; and that version table should not be thought of as the Policy table with time management added to it.
The complete lifetime of policy P138 is shown on the timeline below.
.gif)
Figure 6: Timeline Diagram for Scenario 3.3
It is only now, when the policy has been logically deleted, that we have full knowledge of its timeline. It is only now that the last version on the timeline is not followed by an ellipsis.
The Semantics of Version Pattern 3
In Part 4, we defined four semantic constraints for Version Pattern 3. These constraints spell out the implications of the single-episode, no-gaps assumptions that are in effect for this version pattern. To repeat, those constraints are as follows:
- {SC 3-1}. If a version for an object is followed by another version for that same object, the (implied) end date of the former version must be one tick of the clock prior to the begin date for the latter version.
- {SC 3-2}. If a version for an object is not followed by another version for that same object and is not itself a logical delete version, the end date will be assumed to be unspecified. (This is not always the case, of course; and in later patterns, we will drop this simplifying assumption.)
- {SC 3-3}. If a version for an object is a delete version, the object will be considered deleted as of that delete date.
- {SC 3-4}. It is semantically illegitimate for any version of an object to follow (to be chronologically later than, based on its version date) a logical delete version of that object.
In this installment, we have defined two dates which are needed to implement Version Pattern 3 - a version begin date and an object end date. Additional semantic constraints specify the rules for populating these dates.
The first group of additional constraints specifies the rules for populating these two dates during the insert of the first version for an object. Semantically, this is the action of inserting a new policy.
- {SC 3-5}. Ver-beg-dt cannot be in the past or in the future, because it is also the system insert date. To set such a date to anything but [now] would be a lie. We can't do anything in the past, or in the future. [Now] is all we have.
- {SC3-6}. Obj-end-dt must be null. We can't end something before it begins. We can't even end something when it begins. We can only end something after it begins.
The second group of additional constraints specify the rules for populating these two dates during the insert of a noninitial, nonlogical delete version for an object. Semantically, this is the action of updating an existing policy.
- {SC 3-7}. For every noninitial version of a policy, its ver-beg-dt must be at least one clock tick past the ver-beg-dt of the immediately prior version for the policy. If it is not, then the two versions will overlap for at least one clock tick. But that is invalid because there cannot be two versions of the truth at any given point in time.
However, for Version Pattern 3, this condition should never arise. Since a ver-beg-dt is also the system insert date, only an incorrect system clock could permit a version to have an earlier begin date than the version which preceded it. It shouldn't be necessary to write code to enforce {SC 3-7} because if the system clock is incorrect, we have a much bigger problem on our hands and will probably have to roll back and reapply updates anyway.
- {SC 3-8}. Obj-end-dt must be null. If it is not, the policy is being deleted; and, by assumption, this is not a delete transaction.
The third group of additional constraints specify the rules for populating object end date during a logical delete. Semantically, this is the action of terminating a policy.
- {SC 3-9}.To terminate a given policy, set the object end date of the current version (which may or may not be the initial version) to [now]. The object end date cannot be in the past or in the future, because it is also the system insert date.
What Comes Next
Version Pattern 3 is a significant achievement. It retains the knowledge of the state of an object at any point during the lifetime of that object. Indeed, what more could there be? What more could businesses want that versioned tables can provide?
The answer is: quite a lot. To begin with, we might want to see what would happen with Version Pattern 3 if we distinguished business dates from system activity dates. This would mean that version begin date and object end date are business dates only. For system activity dates, we would then want to keep an insert date and a version update date. That latter date will be {null} except when the version is updated. And because with versions we are no longer considering updates in place, this update can only be one thing - the date the object end date was specified.
In the academic literature, versioning which has both a business set of dates and a system activity set of dates is said to implement a "bitemporal" pattern.
In providing an annotated list of version patterns, in Part 2, we did not distinguish a bitemporal and nonbitemporal variation of Version Pattern 3. (In addition, the description of Version Pattern 3 was incorrect, and should have stated the opposite of what it did state.) So following is a revised annotated list of the version patterns which we are discussing in these articles.
This list provides a correct description for pattern 3. It "bumps down" the other patterns, and specifies a pattern 4 which is pattern 3 with business and system activity dates distinguished. Finally, it reverses the sequence of the original list's ninth and tenth patterns in order to put the two patterns which describe snapshots next to one another.

Figure 7: Annotated List of Version Patterns - Revision 1
Tom Johnston is an independent consultant specializing in enterprise data architecture, and in relational, object-oriented and data warehouse modeling in various industries, including telecommunications, health care, banking, retailing and transportation. He can be reached at tjohnston@acm.org, and his Web site is www.MindfulData.org.
Randall Weis, founder and CEO of InBase, Inc, has more than 24 years of experience in IT and IT management, specializing in enterprise data architecture. Weis' technical expertise is in sophisticated, multitiered systems. He has designed logical and physical data models and implemented several high profile, very large database (VLDB) systems in the financial and insurance industries. These systems have had very stringent performance and real-time history requirements. His software development company, InBase, Inc., has developed software and Web sites used by some of the nations largest companies. Weis has been a presenter at various user groups, including Guide, Share, Midwest Database Users Group and Camp IT Expo. His technique for modeling history, retro activity and future dating has been reviewed and approved for the physical implementation of IBM's Insurance Application Architecture (IAA). He may be reached via email at randyw@inbaseinc.com.
For more information on related topics, visit the following channels:





