Free Site RegistrationFree Site Registration

Sign up today and access Information Management on the web!
Your FREE registration entitles you to:

FREE email newsletters

FREE access to all Information Management content

FREE access to web seminars, resource portals, our white paper library and more!

Time and Time Again: Managing Time in Relational Databases, Part 3 - Version Patterns 1 and 2

InfoManagement Direct, May 2007

Tom Johnston, Randall Weis

Version Pattern 1: Updating in Place

Updates in place destroy history by overwriting the previous version of the rows they update. But in most cases, this is the way tables are in fact updated. In these cases, the business does not require that the earlier states of the updated rows be available as queryable history. For this kind of data and these business requirements, the need for history is so infrequent and so nonreal-time critical, that reconstructable history is good enough.

Advertisement

However, even with tables that lack queryable history, it is quite common for developers to include both a create date and a last update date; these two dates provide a minimal level of support for history. Let's include them as business requirements for Version Pattern 1 and note them as follows:

{BR 1}. Retain the date each row was created, and also the date of its last update.

The scenarios below illustrate Version Pattern 1. Our example will be an insurance policy table.

Scenario 1.1: an Insert to the Policy Table

On 1/12/04, a policy for client Mary Jaspers is inserted into the Policy table for the Acme Healthcare Insurance Company (AHIC). For now, we will assume that this is also the action that makes the policy effective. In other words, a policy with AHIC becomes effective on the date that a row representing that policy is inserted into AHIC's Policy table.

Note: although 1/12/04 is in the past, as of the date of publication of this article, the paragraph above said that on "1/12/04, a policy for client Mary Jaspers is inserted into the Policy table" (italics added). It did not say ".... A policy ...... was inserted .......". This "is" should be consider a tenseless "is," not an indication of the present tense. The reason to speak tenselessly whenever possible is that otherwise, the tenses involved in an English language description of these situations simply become too complex. With a tenseless natural language description, we will not be distracted from the focus of these articles, which is the management of time in relational databases, not the management of time and tenses in natural languages.

In later history patterns, we will drop this assumption, and distinguish system activity dates such as dates rows were inserted, updated or deleted, from business effectivity dates such as begin and end dates of business effectivity, or the effectivity dates for a correction to an earlier entry. This distinction between system activity and business effectivity dates is referred to, in the literature, as the "bi-temporal" distinction.

Note: those in the health care insurance industry will recognize that this example about policies is not a typical situation. Typically, health care insurance policies are group policies, which are contracts between a client company and the insurance company, not a contract between the insured individuals and the insurance company.

In the property and casualty industry, on the other hand, policies which cover an individual or family are usually contracts between the insurance company and that individual or family. But health care insurance companies do issue individual policies, and we believe that this will provide a simpler set of examples with which to illustrate our proposal for managing versioned history.

Scenario 1.1 shows the result of entering Mary's policy into the Policy table. (See the previous article - Part 2 - for an explanation of the scenario template used here and throughout the remaining articles in this series.)

Note: the conventions for distinguishing different types of columns in these illustrative tables are as follows:

  • Primary key columns are listed left-most, and the column headings are bold-faced. They contain the suffix (PK).
  • Foreign key columns are listed next, and the column headings are underlined. They contain the suffix (FK).

Note: these articles do not contain examples of foreign keys which are also primary keys. One of us (Johnston) has written several articles explaining why he thinks that foreign keys should never occur in primary keys. See References for links to these articles.

  • Non-primary key and non-foreign key business data columns are listed next.
  • Metadata columns, such as crt-dt and last-updt-dt, are listed last.

The client-nbr foreign key can be ignored for now. It will be used in later scenarios. The same is true for [now], whose template cell contains "n/a" (for "not applicable"). [Now] (the brackets indicate that we are talking about a metadata item, and not merely referring to the present moment) will not become relevant until we move on to more complex scenarios.

The policy type is PPO (preferred provider option). The first metadata column (crt-dt) tells us that the row was inserted on 1/12/04. The {null} in the second metadata column (last-updt-dt) tells us that, so far, this row has not been updated.

Presumably there are many other rows and columns in the Policy table. But we won't show any of them until later on, when they are needed to illustrate discussions of more complex patterns.

Scenario 1.2: an Update to the Policy Table

On 3/14/05, Mary changes her policy type from PPO to HMO (Health Maintenance Organization). After that change is recorded in the Policy table, Mary's Policy row looks like this:

The use of a create date and last update date is common in business IT. And with those two dates, we are not entirely lacking in temporal metadata about this row. Crt-dt tells us that the row was inserted on 1/12/04, last-updt-dt that it was last changed on 3/14/05.

But there is, nonetheless, a lot that we do not know. We do not know how many times this row was changed prior to 3/14/05 (if any), nor what column or columns were changed by any earlier updates. We don't even know what column or columns were changed on 3/14/05. And even if we did know that it was the policy type column that was changed, we still wouldn't know what the previous value was.

So, information has been lost. More precisely, using the taxonomy introduced in Part 1, information in the form of queryable history has been lost. Reconstructable history is always available, of course. In this case, to get Policy P138's reconstructable history, we would scan the logfile of the initial inserts and all subsequent updates to the Policy table, starting on 1/12/04, and arrange the logfile entries for P138 in chronological order.

Scenario 1.3: a Physical Delete to the Policy Table

On 6/1/06, Mary's policy is ended (by contract expiration, explicit termination, revocation, etc. - it doesn't matter). This is recorded, using Version Pattern 1, by physically deleting her policy's row in the Policy table. After that change is recorded in the Policy table, the table looks like this:

After the update shown in Scenario 1.2, as we said above, we have both the current state of Mary's policy, and also a small amount of metadata. Now we have neither. We can't even tell that Mary ever had a policy with AHIC.

Sometimes losing this much information to queryable access doesn't matter. Sometimes all we need to know in real-time is what things are like right now. But frequently, businesses do require a minimal amount of real-time available history, such as, in this case, information about policies no longer in effect.

This requirement can be met by replacing a physical delete with a logical delete. This too is a technique well-known to business IT professionals.

Eventually, logically deleted rows will be archived off the online table. This is usually done based on their age. For example, the business rule might be "at each month-end, archive all logically deleted rows that were deleted more than forty-nine months ago." In terms of the taxonomy introduced in Part 1, this is a rule which states the conditions under which a row changes from queryable to reconstructable.

But for now, we can ignore the archive process. What we want to do is see how a logical delete is typically implemented. This is shown in Version Pattern 2.

Version Pattern 2: Updating in Place with Logical Deletion

Because physical deletes destroy even the little historical information that Version Pattern 1 provides, businesses frequently ask their IT departments for some way to preserve that minimal level of historical information. Usually, the way this is done is to do a logical delete instead of a physical delete.

{BR 2}. Support {BR 1}, and also retain deleted rows as queryable history.

There are variations on this theme, of course, as there are on almost any data pattern. We will explain below why we chose the variation we did.

Scenario 2.1: A Logical Delete to the Policy Table

To support logical deletes, we need an additional column - a delete indicator. Often this is implemented as a delete flag, set to "Y" if the row is logically deleted, and to "N" otherwise. The date of the deletion is then put in last-updt-dt.

Another option is to have a separate delete date column. With this option, a delete flag isn't needed, because logically deleted rows are all and only those rows whose delete dates are not null.

We prefer this second option because it preserves rather than overwrites the metadata information about the last update date (and also because we have a further use for this new column, which we will discuss later on).

On 6/12/06, Mary Franklin's policy ends. Starting on that date, it is no longer in effect. This time, we record that event by logically deleting her policy's row in the Policy table. After that change is recorded, the table looks like this:

Note: to make it possible to show an entire row on one line, we will start doubling up. The two metadata dates, crt-dt and last-updt-dt, will be stacked from this point forward. Later, other such doubling-ups may be required. Keep in mind that in spite of the graphical layout, crt-dt and last-updt-dt are still two distinct columns in this table.

The Semantics of Version Pattern 2

For Version Pattern 2 to satisfy its business requirements, the following constraints must be enforced when updating the table:

  • {SC 2-1}. Always, del-dt must be null if the row is not logically deleted, and must otherwise contain a valid date.
  • {SC 2-2}. When entered, del-dt must be the current date, or else a future date. It cannot be a past date.
  • {SC 2-3}. When entered, foreign key semantics must be enforced.
  • {SC 2-4}. Once entered, del-dt cannot be changed. If del-dt is not null, the date in it must be the date originally put in it.

{SC 2-2} is particularly interesting. What is the reasoning behind it? Why can't we use a past date when logically deleting a row? After all, if we were supposed to delete a row on a particular date, and simply forgot to do it, why can't we just apply the delete retroactively? Wouldn't that produce the same result as if we had done the delete on its originally scheduled date?

The answer is "no." If a past date is used, we are changing the past, and doing so without leaving any indication that we did. In other words, if a past date is used, we were either not telling the truth before the logical delete, or we are not telling the truth after the logical delete. Here's why.

Imagine we were supposed to delete the row seven days ago and that we logically delete the row, today, by setting its del-dt to the date seven days ago. Whether we did this seven days ago, or do it today, the result will be the same. From today forward, the database will show a logical delete on 6/12/06.

However, for the seven days prior to this action, the row was not deleted during those seven days, and so during that period of time, our database says that the policy is active. But as soon as we put a past date into del-dt, our database now says that the row was logically deleted during those seven days, and therefore that the policy was not active during that period of time. Well, either it was active over the past seven days, or it wasn't. We can't have it both ways.

With this retroactive delete, we have changed the past, and left no indication that we have done so. A query counting active policies in the period 6/6/06 - 6/12/06, run sometime during the past week, would include Mary's policy in that count. The identical query, run anytime after that event, would not include her policy.

Even so, the assumption is that we were supposed to delete that row some time in the past. The reason is that, just as a policy becomes effective on the date a row for that policy is inserted into the Policy table, a policy ceases to be effective on the date a row for that policy is deleted from the Policy table. We missed the date to delete the policy, and now we apparently have no way to correct the mistake and show the date on which Mary's policy became no longer in effect.

Our dilemma is this: as long as the date of a physical insert into a table is also a business effectivity start date, and the date of a physical delete is also a business effectivity end date, we better be sure we can insert and delete on time! For if we miss either an insert or a delete date, the only way we can correct the mistake is to lie about the past!

The dilemma is resolved, not by lying about the past, but rather by eliminating the homonyms that are the source of the dilemma. Crt-dt is a homonym because it means both "the date the row is inserted" and also "the date that the object represented by the row becomes effective." Del-dt is a homonym for analogous reasons.

As we said at the beginning of this article, distinguishing database activity dates from business effectivity dates creates what computer scientists call a "bi-temporal" pattern. We will have more to say about bi-temporality in later installments, starting with the discussion of Version Pattern 5.

As for {SC2-3}, it simply says that semantically, a logical deletion must follow the same referential integrity rules as does a physical deletion. So, for example, if there are any foreign key dependencies on P138, then either the logical delete of P138 must be blocked, or else it must trigger a cascade (logical) delete to all rows that refer to it.

As for {SC 2-4}, the reason we cannot change del-dt once it has become non-null is the same. Doing so would change the past. Our database would give different answers, depending on when it was queried. At least one of those answers would have to be false.

This concludes our discussion of Version Patterns 1 and 2. The next installment will discuss Version Pattern 3.

References:

  1. Tom Johnston. "Primary Key Reengineering Projects: the Problem."DM Review. February 2000.
  2. Tom Johnston. "Primary Key Reengineering Projects: the Solution."DMReview.com. March 1, 2000.
  3. Tom Johnston. "The Semantics of Surrogate Keys and Business Keys: Part 1."Datawarehouse.com. December 6, 2002.
  4. Tom Johnston. "The Semantics of Surrogate Keys and Business Keys: Part 2." Datawarehouse.com. January 13, 2003.

Tom Johnston is an independent consultant specializing in enterprise data architecture, and in relational, object-oriented and data warehouse modeling in various industries, including telecommunications, health care, banking, retailing and transportation. He can be reached at tjohnston@acm.org, and his Web site is www.MindfulData.org.

Randall Weis, founder and CEO of InBase, Inc, has more than 24 years of experience in IT and IT management, specializing in enterprise data architecture. Weis' technical expertise is in sophisticated, multitiered systems. He has designed logical and physical data models and implemented several high profile, very large database (VLDB) systems in the financial and insurance industries. These systems have had very stringent performance and real-time history requirements. His software development company, InBase, Inc., has developed software and Web sites used by some of the nations largest companies. Weis has been a presenter at various user groups, including Guide, Share, Midwest Database Users Group and Camp IT Expo. His technique for modeling history, retro activity and future dating has been reviewed and approved for the physical implementation of IBM's Insurance Application Architecture (IAA). He may be reached via email at randyw@inbaseinc.com.

For more information on related topics, visit the following channels:

Advertisement

Advertisement