Free Site RegistrationFree Site Registration

Sign up today and access Information Management on the web!
Your FREE registration entitles you to:

FREE email newsletters

FREE access to all Information Management content

FREE access to web seminars, resource portals, our white paper library and more!

Time and Time Again: Managing Time in Relational Databases, Part 6: Version Pattern 4

InfoManagement Direct, June 2007

Tom Johnston, Randall Weis

Version Pattern 3, discussed in Part 5, keeps a record of the state of an object at every point during its lifetime. This results in an unbroken chronological series of versions of the object and guarantees that no versionable updates will be lost.

With Pattern 3 and Pattern 4, each version of an object is distinguished from all other versions of the same object by means of a version begin date.

  • The start of an object's lifetime is marked by the chronologically earliest version of that object.
  • The end of an object's lifetime is marked by a chronologically latest version which contains an object end date.
  • The current version of an object whose lifetime is not yet ended is marked by a chronologically latest version which contains {null} in its object end date.

Advertisement

Notice that these versions have no version end date. This is because Patterns 3 and 4 assume that every non-initial version follows its predecessor immediately, i.e., that when a new version is added, the previous version ends its business effectivity exactly one clock tick prior to the start of business effectivity of the newly added version. In other words, with Patterns 3 and 4, inserting a new version into a table determines both the begin date for that version, and also the (implicit) end date for what had been the current version until that insert took place.

A second assumption that characterizes Patterns 3 and 4 is that after an object has been deleted, it cannot reappear in the database. More technically, and using our Policy table example, given a version with a non-null object end date, no version with a later version begin date can have the same policy number. Equivalently, any version with a later begin date must be a version of a different policy. In Part 4, we introduced the concept of an episode, and characterized this second assumption as the assumption that no object can have more than one episode.

These two assumptions are semantic constraints on Patterns 3 and 4. If your business requirements for versioned history rule out either or both of these assumptions, then neither of these patterns will meet your needs.

Patterns 3 and 4

A version pattern is most precisely defined by the combination of a) its business requirement (noted as {BR-x}, where x is any number) which the pattern must satisfy, and b) its semantic constraints (noted as {SC-x-y}, where x is the number for the pattern, and y is the number for the constraint). In these terms, Pattern 4, which is the pattern we will examine in this article, satisfies the same business requirement and operates under the same semantic constraints as Pattern 3 - with one exception. That exception is semantic constraint {SC 3-5}:

  • {SC 3-5}. Ver-beg-dt cannot be in the past nor in the future because it is also the system insert date. To set such a date to anything but [now] would be a lie. We can't do anything in the past, nor in the future. [Now] is all we have.

With Pattern 4, this constraint is changed. Semantic constraints 1 - 4 and 6 - 7 are identical to their corresponding Pattern 3 constraints. But Pattern 4 has no constraint corresponding to {SC 3-5} because it uses distinct dates to indicate when the version becomes effective (ver-beg-dt) and when the version is inserted into its table (crt-dt).

Bi-Temporality

With Pattern 4, we introduce a second set of dates. The first set is business effectivity dates. For Patterns 3 and 4, they are:

  • Version-begin-date (ver-beg-dt): the clock tick on which that version of the object becomes the currently in effect version.
  • Object-end-date (obj-end-dt): the last clock tick on which the last version of the object is in effect.

The second set, included in Pattern 4 but not Pattern 3, also consists of a pair of dates. They are:

  • Create-date (crt-dt): the clock tick on which that row is physically inserted into its table.
  • Last update date (last-updt-dt): the clock tick on which that row is physically updated.

Normally, versions aren't updated. And indeed, versions are never logically updated. The information they contain is never lost. But as noted in Part 5, there is one case in which a version may be physically updated. That happens when a transaction is received which directs the system to logically delete the object. This logical delete of the object is implemented by physically updating the current version, changing its object end date from {null} to the date of the logical delete. When this physical update takes place, the last update date records the clock tick on which it does. In all other cases, last update date is null.

These two sets of dates are orthogonal to one another. Changes to one set do not necessarily entail changes to the other set.

Computer scientists have called the use of these two sets of dates the "bi-temporal" data pattern. The difference between Patterns 3 and 4, then, is solely that Pattern 3 is not bi-temporal while Pattern 4 is.

To work out the implications of this difference between the two patterns, let's begin by recreating the series of events described in Scenario 1.

Scenario 4.1: a Proactive Insert

With Pattern 4, it is now possible to physically insert a row on a different date than the date it becomes effective. So let's assume that Mary's policy, entered into the database on 1/12/04, does not become effective until 6/1/04. (Correction: In scenario templates, the fourth cell down on the left-hand side specifies the date or dates on which the version could validly have been inserted. In Part 5, Scenario 3.1, that cell contains "1/12/04 or any later date." It should contain "N/A," because for that Version Pattern, there is no choice about when the insert takes place; it must take place on the ver-beg-dt.)

The primary key of the policy version is policy-nbr (the unique identifier for a policy) plus version begin date. All versions for the same policy have the same policy number, and are distinguished from one another by their version begin date. In addition, for all versions but the current one (if one is currently in effect), we also know the dates on which they end their effectivity. If there is a terminal version, it is the clock tick in its object end date. Otherwise, it is one clock tick prior to the begin date of the chronologically next version.

The insert of the row shown above is a proactive insert. It takes place some four and a half months prior to the clock tick on which the inserted row becomes the in-effect row for that object. This insert is shown on the timeline diagram below.

With proactive inserts, there is a period of time between when the row was physically inserted and when the version it represents becomes effective. In the timeline diagram for Scenario 4.1, this period of time is indicated by the shaded block at the front of the version.

One situation in which proactive inserts are valuable is when there are multiple versions that become effective on the same clock tick. If our clock ticks once per day, then we have 24 hours to insert all those versions. Usually, that will be enough time to handle all but the largest volume situations.

But suppose our clock ticks once per second. With Pattern 3, we cannot insert versions rapidly enough to give more than a small number of them the correct ver-beg-dt. This immediately suggests that we should assign the correct ver-beg-dt to versions, rather than reading that date from the system clock once for each version.

Indeed, the more granular our clock ticks are, the less reasonable it is to design a system in which each version's ver-beg-dt is populated from a read of the system clock. So suppose we redesign our system to assign version begin dates rather than base them on a read of the system clock. Since we also need to record the clock tick on which rows are physically inserted - for all the usual reasons, such as backing out an entire batch run - we need to add an additional pair of dates, which we will call create date and last update date. Doing this gets us from Pattern 3 to Pattern 4.

Retroactive Inserts?

By inserting this initial version, on 1/12/04, we have created policy P138, effective 6/1/2004. But with increased expressive power - here the power to physically insert versions on dates other than their effectivity dates - comes an increased possibility for making a mistake.

One such mistake is a retroactive insert. Suppose that instead of inserting Mary's policy on 1/12/04, we instead inserted it on 8/1/04.

Before we explain why this is an error, let's note that there is nothing the database management system (DBMS) can do to prevent it. Put another way, SQL DDL does not let us tell the DBMS that retroactive inserts are invalid. Until DBMSs are upgraded with robust temporal management capabilities, we are stuck with a do-it-yourself approach. In this case, we must write code that prevents an insert whenever the transaction's create date is later than its version begin date. Because this is an integrity rule which applies to all versions in the database, it should not be left to individual applications to enforce. It should be enforced by the database itself, by means of a pre-insert trigger.

Why Not?

The reason that retroactive inserts are not valid is easy to see. They change the past. Prior to the retroactive insert, the past looked one way; afterwards, it looks different.

Consider policy P138's version that has a 6/1/04 effective date but that we didn't insert until 8/1/04. Prior to the retroactive insert, the past from 6/1/04 up to 8/1/04 looked like there was no policy P138 in effect at any point during that period of time. Afterwards, it looks like there was a policy P138 in effect during that period of time. But both things can't be true of the period 6/1/04 - 8/1/04; either policy P138 was in effect, or it wasn't!

It's certainly true that during any period of time, including the 6/1/04 - 8/1/04 period, either P138 was in effect or it wasn't. But suppose that it was. In that case, for those two months, the database was incorrect. Surely there must be a way to correct mistakes?

Well, with mistakes, we can either correct them or we can ignore them. If we correct them, we can either eradicate all traces of the error or we can preserve the evidence. The choice depends on the cost/benefit ratio of each alternative. Ignoring errors, actually doing nothing about them, is usually not an option when we are dealing with production databases, so we won't consider that option any further.

When we are dealing with transactionally updated databases, accountants know that the way to correct an error is to a) create a transaction which offsets the bad transaction, and then b) add the correct transaction. Although our Policy table is not transactionally updated, it is versioned, so something similar should take place. The error should not be erased or overwritten. It should be "sequestered" in some way, so that it is invisible to "normal" queries but nonetheless retrievable with specially written queries. Then the correct version should be inserted in its place.

This is the "preserve the evidence" approach to error correction. But it is not always the best choice. Consider that correcting mistakes while at the same time not removing all traces of them from queryable history (see Part 1 for an explanation of the term "queryable history") requires complex logic for both updating the database and subsequently querying the database after the correction. The cost of developing this logic is not trivial. Moreover, the potential for mistakes in queries against a database containing such corrections - especially mistakes in end-user-written, non-IT-managed queries - is very real and can be as costly as you care to imagine. If there are no legal requirements to retain a queryable record of the mistake, for example if the only legal requirement is to be able to pull archive tapes when auditors ask for them, then the benefit of making both the correction and the error immediately queryable is not very great. In a case like this, the best approach is to snapshot the database (so the error can later be preserved in archival storage), and then proceed with the retroactive insert (or the update in place of an existing version, if that is what is required).

However, suppose our queryable database is a database of record for the data in question (policies, in our case). Suppose, further, that there is either a legal or a customer relations downside to telling one story about whether or not policy P138 was in effect throughout 6/1/04-8/1/04, when asked prior to 8/1/04, and subsequently telling the opposite story when asked.

A legal downside is easy to imagine. If a report on policies in effect was run originally on 7/31/04, and sent to the appropriate regulatory agency, but a week later rerun and sent again to that agency, the two reports are not going to agree about P138, or about any counts or other statistics it might contribute to.

A customer relations downside is equally easy to imagine. If our customer service representative (CSR) tells a customer one story about her policy on one call and a contradictory story on a later call, the customer is not likely to be happy and won't feel increased confidence in the reliability of our company.

Downsides that cannot be contained within the enterprise, such as regulatory, customer relationship and even vendor relationship downsides, are usually taken far more seriously by executive management than are downsides that affect only internal operations. With externally visible downsides, the cost/benefit ratio quickly swings to the "preserve the evidence" approach to error correction.

But neither Version Pattern 3 nor 4 supports correction with error retention. So if your business requirements for versioned history do not require error retention in queryable history and do permit you to manage nonrecurring continuously existing objects, then you may still be able to use one or both of these patterns. Otherwise, neither Pattern 3 nor 4 will meet your needs. We will have to wait for a later Version Pattern to get error correction with error retention.

In Part 7, we will finish our discussion of Version Pattern 4.

Tom Johnston is an independent consultant specializing in enterprise data architecture, and in relational, object-oriented and data warehouse modeling in various industries, including telecommunications, health care, banking, retailing and transportation. He can be reached at tjohnston@acm.org, and his Web site is www.MindfulData.org.

Randall Weis, founder and CEO of InBase, Inc, has more than 24 years of experience in IT and IT management, specializing in enterprise data architecture. Weis' technical expertise is in sophisticated, multitiered systems. He has designed logical and physical data models and implemented several high profile, very large database (VLDB) systems in the financial and insurance industries. These systems have had very stringent performance and real-time history requirements. His software development company, InBase, Inc., has developed software and Web sites used by some of the nations largest companies. Weis has been a presenter at various user groups, including Guide, Share, Midwest Database Users Group and Camp IT Expo. His technique for modeling history, retro activity and future dating has been reviewed and approved for the physical implementation of IBM's Insurance Application Architecture (IAA). He may be reached via email at randyw@inbaseinc.com.

For more information on related topics, visit the following channels:

Advertisement

Advertisement