Note: A glossary of technical terms used in these columns can be found on the Web sites of the authors. In addition, a listing of earlier articles and/or columns in this series can be found on DMReview.com, MindfulData.com and InbaseInc.com.
Figure 1 presents the side-by-side diagram we will use to illustrate the points we will make in this installment. Notice that there are two identical timeline graphics and two nonidentical schemas.
On the left is the schema for a non-temporal table. The underlined column is the primary key. The italicized column is a foreign key to a table that will appear in future columns. Type and Copay are the two business data columns in the table, which is a table of health care insurance policies. Create and update date are, respectively, the date the row was inserted into the table and the date the row was last updated.
On the right is the schema for a uni-temporal table. We will refer to it as a version table, because that is standard terminology among IT professionals. When the business requires us to track changes to a conventional, non-temporal table, a common approach is to version the table by adding a date to the primary key, as we have done here.
There are many variations on version tables, but the variations make no difference to this discussion. In this particular case, version date is also the date the row was inserted, although in other variations, that is not always the case. There is no last update date, because versions are not physically updated. Instead, each logical update is carried out by copying the current version, applying the update to the copy and inserting the result as the new most current version. Finally, this particular variation on version tables also contains a delete flag, although this too is not present in all variations. When a delete is requested, the most current row is copied, its delete flag is changed from N to Y, and the new row inserted.
Figure 2 shows the results of applying the same insert transaction to each table. The result is a single row, as shown. Above each timeline is a bracket. The non-temporal bracket shows the period of time the row has been in the table. The temporal bracket, as we will see, shows the period of time that each row was the current version of the policy.
After the insert, the brackets show a one-month period of time, despite the fact that, on the date of the insert that one month has not yet elapsed. This is because we assume, throughout these examples, that the tables are updated once a month. So once a transaction has been applied, the results will remain on those tables for at least a month.
Figure 3 shows the results of applying the same update transaction to each table. This transaction changes the copay amount on the policy from $15 to $20. The update was applied in May, so the brackets extend through June.
In the case of the non-temporal table, it is a simple update in place. In the case of the version table, the result is a new row - a new version of the policy. A second bracket has been added to the diagram for the version table to show that there are now two versions.
There are two important things to note. First, the single bracket for the non-temporal table and the adjacent pair of brackets for the version table start at the same point in time and extend out to the same point in time. Second, on the version table there are no temporal gaps between versions.
In fact, there is no way to indicate a temporal gap, because there is only a version begin date. This versioning variation assumes that one version ends when the next one begins, although other variations may include a version end date as well. More precisely, this kind of versioning assumes that all versions whose delete flag is N end when and only when the next version begins, but that a version whose delete flag is Y ends on the version date. An experienced data modeler could point out shortcomings of this approach, but for some business requirements, those shortcomings don't matter. This is a variation of versioning that we have seen used many times.
The bracket for the single row on the non-temporal table shows us when that row appeared in the table and how long it has been there. The contiguous brackets for the version table show us two versions with no temporal gap between them that cover exactly the same period of time. Neither version corresponds to the row in the non-temporal table, but the contiguous set of them does.
This contiguous set - either of a single row or of multiple rows - is what we call an episode. More precisely, an episode is a temporally contiguous set of rows for the same object. In this case, the object is an insurance policy - policy P861.
We can also think of each version as a timeslice of the object. In these terms, an episode is a contiguous set of timeslices of the same object whose covered period of time is identical to the period of time a corresponding row for the object would exist in a non-temporal table.
This may seem obvious to most readers, but it is nonetheless a very important point. Asserted versioning manages episodes of objects. Version tables are made up of versions represented by individual rows. But when we come to discuss the asserted version approach itself and how the temporal forms of entity integrity and referential integrity are enforced in that context, we will see that they are enforced on episodes, not on versions (i.e. not on individual rows). This is not just an important point in its own right. It distinguishes asserted versioning from the bi-temporal models of both the computer science community (as represented in the referenced work by Dr. Richard T. Snodgrass) and of the principal alternative model (as represented in the referenced work by C. J. Date, Hugh Darwen and Nikos Lorentzos).
Figure 4 shows a second update transaction. This time, the transaction updates the policy type from HMO to PPO. There is nothing new here. We simply see both timelines extended, and both timelines cover exactly the same period of time. The episode's timeline has been extended by adding a temporally contiguous third version to the episode.
Figure 5 shows a delete transaction. In the case of the non-temporal table, the result is that the row representing the policy is removed from the table. But in the case of the version table described above, the result is the addition of a new version, one whose delete flag is set to Y. The brackets for the episode now are closed; there is a vertical bar at the right-hand end of the brackets. We will call an episode whose logical delete date is known a closed episode.
In Figure 5, the timelines no longer correspond; there is no bracket for the non-temporal table. So closed episodes whose date of closure is not in the future correspond to rows that once existed in the corresponding non-temporal table but are no longer present in that table. On the other hand, closed episodes whose date of closure has not yet been reached, and also open episodes, correspond to rows that currently exist in the corresponding non-temporal table.
Objects, Episodes and Versions
Objects are important. In most cases, they are what the data in our tables represent. In the example, each insurance policy is an object. In non-temporal tables, objects are represented by single rows. In temporal tables, single rows represent timeslices of objects, each of which is called a version.
So, in temporal tables, do episodes represent objects? They do not. Episodes represent a period of time during which an object is represented in the database. But sometimes objects go away and show up again later on. For example, policy P861 was terminated on 12/01/04, but it might be re-instated several months later.
The uni-temporal table shown here cannot represent the reoccurrence of an object. Or at least, it cannot do so elegantly. The reason is that it has no way to represent a gap in time during which the object was not represented. An inelegant solution might be to create a dummy version to cover that period of time, perhaps using a third value in the delete flag column to indicate dummy versions. But we aren't here to discuss inelegant variations on uni-temporal tables. Our objective has been to clarify the concept of an episode and to indicate its importance for asserted versioning. So what represents an object in an asserted version table? It is a set of one or more episodes, necessarily non-contiguous because if they were contiguous, they would be a single episode and not multiple episodes. For each contiguous period of time during which the object was recognized by our business, there will only be one episode.
An astute reader may already be thinking of difficult issues relating to episodes. For example, can two episodes merge into one? Can one episode split into multiple episodes? Can past episodes be modified? How are future episodes affected by changes to prior episodes? What about entity and referential integrity among episodes? In each of these situations, how does asserted versioning work? We are finally close to presenting an extensive set of asserted version scenarios. These and other questions about asserted versioning will be answered by discussing those scenarios.
It should also be apparent that many of these issues are difficult. That is why we have emphasized that any implementation of asserted versioning should be as fully encapsulated as possible. As we said before, if someone could write an insert, update or delete against a non-temporal table, they should also be able to write an insert, update or delete against an asserted version table.
The articles/columns in the Time and Time Again series should give you the understanding needed to write your own implementation of asserted versioning. As this column continues, we are developing an implementation of asserted versioning ourselves, one which represents an enterprise solution and not merely a one-database-at-a-time solution. We hope, before the end of the year, to have a preliminary release available on our Web sites.
Asserted versioning, as we illustrated in our taxonomy in the previous two columns, is actually a tri-temporal table, but it shares several important features with this typical uni-temporal version table:
If an object is represented in a non-temporal table by a row that is inserted at time X and deleted at time Y, the same object is represented in a temporal table by a contiguous set of rows, the first of which is effective at time X and the last of which ceases to be effective at time Y. We call this contiguous set of rows an episode.
In a non-temporal table, updates modify the contents of the row but do not create new rows. In a temporal table, updates create new rows called versions, but they do not update existing rows.
Next time, we move on to our second new-roadmap topic. We will discuss the origins of asserted versioning, present the schemas that will be used in the scenarios and indicate entries in our glossary that define the relevant terminology. For a discussion of the new roadmap topics, see our July 2, 2008, column.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access