Note: a glossary of technical terms used in these articles can be found on the Web sites of the authors. The URLs are MindfulData.com and InbaseInc.com.
Last time, we said that we had completed the split transformation. But we were wrong. There is one more thing that needs to be done. So this time, we really will complete that transformation. At the same time, we can also learn something from this mistake: why it has been worthwhile to develop a taxonomy of all possible state transformations of asserted version tables and how to work through each of those transformations.
For it may seem, to some of our readers, that our discussions have become repetitious. After all, from the user's point of view, asserted version tables are maintained much the same way that conventional tables are by means of insert, update and delete transactions.
For example, if the user intends her changes to take effect immediately and has no reason to hold those changes in reserve by means of deferred assertion begin dates, then her original asserted versioning transactions are completely equivalent to conventional transactions. On the other hand, if she wants her changes to take effect at some time other than immediately, all she has to do is to append a date or two to her transactions. Why, then, do we need to work through eight different scenarios in which this is what happens, every time?
Of course these original transactions, as we have seen, are not the transactions that are submitted to the database management system. Instead, asserted versioning code translates them into one, or usually several, physical transactions. They are the transactions that are submitted to the DBMS. But the pattern of translation for each of these three types of original transaction is also clear. An original insert always results in one physical insert being applied to the database. Original updates and deletes always involve withdrawing a assertion and replacing it with other assertions. So why, again, do we need to work through eight different scenarios in which this is what happens, every time?
Figure 1 shows the state of the database, which we said last time represented the completion of a split transaction. But this is an invalid database state. The thoughtful reader might want to look at Figure 1 now and try to find what is invalid about it. (We will provide the answer later on in this column, but first we need the background for that answer.)
Figure 1: After the Split: an Invalid Database State
One Episode or Two?
In Figure 1, the timeline graphic shows two episodes of policy P861. Each is a continuous period of effectivity time. And there is a four-month gap between them. The table under the timeline seems to tell the same story. It shows one single-version episode of P861 that is in effect from January to April of 2008. It also shows another single-version episode of P861, this one in effect from August of 2008 until further notice. They are rows 5 and 4, respectively.
But look at the episode begin date. It is the same on both of them. That says that they are both part of the same episode. If they are part of the same episode, then they are part of a non-overlapping set of versions of the same object which constitute a continuous period of effective time. Yet there is a four-month gap between them, and therefore they do not cover a continuous period of time. So we have a contradiction.
Logicians often say that from a contradiction, anything follows. They mean that any statement can be derived, by means of logically valid arguments, from a contradiction. We can see how this works by examining the contradiction we have in the Policy table. In the process, we can learn how this contradiction is more than a logical anomaly, and how additional errors will enter the database because of it.
Every row in an asserted version table states that there is a continuous effective time period for the object it represents which stretches from the episode begin date on that row, to the effective end date of that row itself. So row 5 is still a true statement. It says that P861 has been continuously in effect from January (the episode begin date) to April (the version end date of that row).
But row 4 is not a true statement. It says that P861 has been continuously in effect from January (its episode begin date) to 9999 (the version end date of that row). Yet that is clearly not the case. There is no row for P861 which covers the effective time gap between rows 5 and 4. So from the set of all P861 rows, we can derive the conclusion that P861 is not in effect during the months of April, May, June and July of last year. But from row 4, we can derive the conclusion that P861 is in effect, continuously, from January 2008 until further notice.
This is a logical contradiction. And so this is the answer to the question we started with. This is what is invalid about the state of the database shown in Figure 1. Now let's see how this contradiction can infect the entire database, permitting any number of additional errors to be introduced.
Temporal Referential Integrity
The errors which this contradiction permits into the database are quite serious. They have to do with temporal referential integrity. For example, although we have not included a Client table in our examples so far, the client column in the Policy table clearly refers to one. In fact, because that column is a temporal foreign key, and not a conventional foreign key, it means that the Client table is also an asserted version table. It follows that, if row 4 did make a true statement, the Client table would contain an episode for client C882 that went into effect no later than January, 2008, that is currently in effect, and that will remain in effect until further notice. In particular, there would have to be a version of C882 in effect during the months of April, May, June and July of last year.
Yet policy P861 imposes no such requirement. Assuming that it is the only policy that C882 has, there is no reason why C882 needs to be effective time present in the database from April to August. So, for example, if C882 were effective time present during those four months, and the user later decided that that was a mistake, she would issue a delete transaction for C882 for those months. And what would happen is that, although the delete transaction was valid, it would be rejected.
On the other hand, if policies are the parents in a temporal referential integrity relationship, then the consequences are even more serious. For example, suppose there is also a Policy Rider table, referentially dependent on the Policy table. Since the Policy table is an asserted version table, this dependency is a temporal referential integrity dependency.
Let's suppose that the Policy Rider table is also an asserted version table. So suppose we are trying to create a rider for policy P861 by issuing an insert for that rider which is retroactive to June 1st of last year, and specifying that the rider is to remain in effect until further notice. Before it creates this rider, asserted versioning code will require that there is an episode for the indicated client, client C882, which was in effect on June 1st of last year, which is currently still in effect, and which will remain in effect until further notice.
Will the asserted versioning code find such an episode? It depends on how that code goes about looking for the episode. One alternative is to scan the Policy table for P861 rows, and look at their effective begin and end dates. Doing this, the code will not find an episode of P861 in effect in either June or July of last year. And so it would block the insert of the rider.
Another alternative is to scan the Policy table for the one row for P861 that has an effective end date of 9999. This scan will return row 4. The next step is not for the code to check the effective begin date of that row, because it doesn't matter what that effective begin date is. What matters is whether or not the episode to which that row belongs became effective on or prior to that June. So the code needs to find the effective begin date, not of that specific version of P861, but rather the effective begin date of the episode to which that version belongs. That can be found in the episode begin date column of that row.
Now the seriousness of our mistake becomes apparent. The episode begin date on row 4 is January 2008, indicating that an episode of P861 became effective on that date. The effective end date of row 4 is 9999, indicating that the episode to which that row belongs is currently in effect and will remain in effect until further notice.
Given this information, the asserted versioning code will permit the rider to be created. The result is a rider which, for the first two months of its effectivity, has no policy. The result is a violation of temporal referential integrity. Fixing the Mistake
The fix, now that the problem is understand, should be evident. It is to correct row 4. This correction is the last step we need to take to complete the split transformation. Since a split transformation always results in two episodes where there had been one, the versions from the original one episode cannot all retain the same episode begin date. The ones that begin after the split point belong to a new episode, and must take on the begin date of that new episode.
To correct row 4, we must first withdraw it into past assertion time. In this way, we preserve a record of our mistake. In this way, we preserve history instead of overwriting it. The next step is to supercede row 4 with a row identical to it except for having the new episode begin date. So nothing in terms of business data has changed. The result is shown in Figure 2. The split transformation is now complete.
Figure 2: The Split Transformation Completed
The split transformation was triggered by an original delete transaction. To the user who specified the transaction, the only thing remarkable about it is that she had to specify precisely the range of effective time months from which the policy was to be removed. But unlike any other original delete transaction, this one involved an extra step. Because it created a new episode from an old one, it was necessary to change all the rows from the old episode which were shifted into the new episode. In our example, of course, there was only one such row.
We know that our taxonomy is a complete enumeration of all possible relationships among two time periods along one and the same timeline. We know it because, topologically, there are no other possibilities. We also suspect that in working through this taxonomy, some of our examples may have seemed to our readers to be repetitious. And we have acknowledged that many aspects of these examples have been repetitious.
But this mistake, on the part of the authors who have originated the asserted versioning approach to managing bi-temporal data, shows us the value of working through a complete set of cases. Any well-constructed machine, whether logical or physical, will work as its parts and their assemblage determine. In doing so, they have the capacity to surprise even their own designers.
Next time, we will work through the last two transformations in our taxonomy.
Note: the authors are writing a book on temporal data management, based in large part on the material in this series. It will be published by Morgan-Kaufmann and will be available in the second quarter of 2010.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access