Time and Time Again: The Origins of Asserted Versioning
Note: A glossary of technical terms used in these articles can be found on the Web sites of the authors. In addition, a listing of previous articles and/or columns in this series can also be found on DMReview.com, MindfulData.com and InbaseInc.com.
Versioning is a widely used technique within the IT community. It is used to keep track of a chronological series of changes to objects of interest - in our example, insurance policies. In its simplest form, a non-temporal table is converted into a version table by adding a versioning date or timestamp to the primary key of that table. It is usually added as the low-order part of the primary key, in order to keep all versions for the same object together. In the version table, all rows with primary keys that are identical except for the versioning date or timestamp are rows that represent what was the current state of the object beginning on the versioning date and, usually, continuing until a newer version is added to the table.
Thus, versioning involves two concepts: objects and versions. Each row is a version, but versions do not stand alone; they are versions of objects.
Objects are things that can persist over time and that can change over time.
Every version is a timeslice of an object.
The asserted versioning approach introduces a third concept - an episode.
An episode is a series of one or more contiguous versions representing the same object.
Asserted versioning manages episodes of objects. Version tables are made up of versions represented by individual rows. But when we come to discuss the asserted version approach itself and how the temporal forms of entity integrity and referential integrity are enforced in that context, it will become apparent that they are enforced on episodes, not on versions (i.e., not on individual rows).
This is not just an important point in its own right. It distinguishes asserted versioning from the bi-temporal models of both the computer science community (as represented in the work by Dr. Richard T. Snodgrass) and of the principal alternative model (as represented in the referenced work by C. J. Date, Hugh Darwen and Nikos Lorentzos).
In the computer science community, work on bi-temporal data management has been going on for several decades. A major milestone was the publication of the TSQL2 proposal in 1994, a proposal to include several extensions to the SQL language to support the management of time in relational databases. This proposal was blocked by the British SQL standards committee, and to this day has still not been implemented as part of the SQL standard. Nonetheless, computer science work on temporal databases has continued.
In 2000, Dr. Richard Snodgrass, the leading computer scientist in the field of bi-temporal data management, published a book in which he introduced this work to the community of IT practitioners and explained how we could, using the SQL and relational database management systems (RDBMSs) of the day, implement bi-temporality in real-world databases. This book illustrates these concepts by using several real-world scenarios and contains an extensive set of SQL code fragments that support that implementation.
In the computer science approach to bi-temporality (which we will refer to as the "standard" bi-temporal model), there is no concept of an object represented by multiple versions. There are just rows that contain two pairs of dates. The first pair of dates defines what the standard model calls the "valid time" of those rows, and what we, following standard IT terminology, call the "effective time" of the version. The second pair of dates defines what the standard model calls the "transaction time" of those rows, and what we prefer to call "assertion time," for reasons we will discuss later on.
However, the standard bi-temporal model can represent objects and versions. For example, a standard bi-temporal table of insurance policies can be created, managed and queried. But in the standard model, these two concepts are extraneous. Standard bi-temporal semantics are not enforced on objects or versions. They are enforced on individual rows. Nonetheless, if inserts, updates and deletes against a standard model bi-temporal table are used with an awareness that the table is one of versions of objects, versioning can be supported. But it is users, the ones who maintain those tables with that awareness, who must ensure that the semantics of objects and versions are supported. For this reason, we say that the standard bi-temporal model permits the management of objects and versions, but does not support it.
Two years after Dr. Snodgrass published his book, C. J. Date and two associates published a book in which they describe their own suggestions for adding bi-temporal functionality to the relational model. They offer their work as an alternative to TSQL2 (as well as later work by Snodgrass and others) and claim that their proposal fully conforms to the relational model while the work by Snodgrass and the rest of the computer science community does not.
We will have little more to say about Date's approach, because it is a recommendation on how to modify the SQL standard and not on how to support bi-temporality with today's SQL. However, we will note that our concept of both objects and episodes seems to violate Codd's Information Principle, as those authors understand it. The reason is that both of these concepts are instantiated by sets of rows, not by individual rows. We will discuss this objection to asserted versioning toward the end of this series, once we have a full understanding of what asserted versioning is.
As shown in Figure 1, there are three origins of asserted versioning. One is versioning itself, a family of approaches all of which involve adding one or two dates or timestamps to the primary key of a table. Several variations on this approach were examined early in our series "Time and Time Again" and can be found in the archives of DM Review.
Another is the extensive work done on bi-temporality by computer scientists. To give an idea of what we mean by "extensive," we believe that there was a review of the literature published about a decade ago in which the authors said that they had identified approximately 1,200 articles on the topic. And over the last decade, it does not appear that the pace of work by computer scientists has slowed down.
The third origin of asserted versioning is our own work, which has resulted in two production databases whose tables conform to the schemas described below. Weis developed the concept of versions that can be added to tables prior to being "activated" or, as we will say, prior to being "asserted." This is new temporal semantics, not supported by the standard bi-temporal model.
For example, we could add rows to a table that are versions of insurance policies, all of which will go into effect three months from now. These, of course, are future-dated versions. But perhaps we only have some of the data for these versions, so the rows are incomplete when added. For that reason, we do not want these versions to show up in queries, because even though they describe something in the future, we are nonetheless not ready to reveal them to the world, to assert them. We therefore add these future versions, with an assertion begin date that is also in the future, a date on which we are sure we will be willing to reveal these future-dated versions to the world.
Johnston then added the concept of episodes to the asserted versioning approach, and together we developed a way to enforce the temporal correlates of entity and referential integrity on episodes. And it is to do so "under the wraps," without those who formulate maintenance transactions needing to know anything about temporal constraints. This implements half of the encapsulation of bi-temporal semantics that we think is necessary to hide the complexity of its implementation. The other half of the encapsulation is to make it possible to write queries directly against asserted version tables that, if current data only is to be returned, will be identical to queries against non-temporal ("normal") tables, and if non-current data is to be returned, only requires the specification of one or two dates in the WHERE clause of the queries.
Next time, we will look at a data model of the example we will use to illustrate asserted versioning. This data model is an extension of the model for our sample database of clients and policies. The extension consists of three additional tables. One is a table of wellness programs, the second a table of wellness program categories and the third is a table of enrollments of clients in those programs.
The wellness category table is a non-temporal table. We include it so that our examples will illustrate how bi-temporal and non-temporal tables interact as integrity constraints are applied to the transactions that maintain them.
The enrollment table is an associative table, expressing a many-to-many relationship between clients and programs. With it, our example will cover all three basic types of relational tables - kernel tables, dependent tables and associative tables.
- C. J. Date, Hugh Darwen and Nikos Lorentzos. Temporal Data and the Relational Model. Morgan-Kaufmann, 2002).
- R. T. Snodgrass. Developing Time-Oriented Database Applications in SQL. Morgan-Kaufmann, 2000.
- R. T. Snodgrass et al. TSQL2 Language Specification. SIGMOD Record, March, 1994.