The major, but by no means exclusive, focus of future columns in this series will be on what I have been calling the "ultimate versioning pattern" but from now on will be calling the "asserted version pattern," or simply "asserted versioning." Following is a brief discussion of the major themes of the asserted versioning approach to managing temporal data.
Seamless access is access to any combination of past, current and future states of persistent objects that are or may be needed, separately or in combination, to support operational decision-making or to complete operational transactions. Expressed in architectural terms, it is access to states of persistent objects that belong in online transaction processing (OLTP) or operational data store (ODS) databases.
Seamless access is fully encapsulated along three dimensions. These are:
1. Query encapsulation. Queries against bi-temporal tables must be simple enough that if a business user could write a query against a non-temporal table, she could also write a query against the corresponding bi-temporal table. This is essential if we are not to return to the "bad old days" in which, for non-production queries or reports, business users had to submit a request to the IT department and usually wait days or weeks for the result. Specifically:
- If these queries specify current data they must be identical to queries against corresponding non-temporal tables except for specifying that now falls within both effective and asserted date ranges.
- If these queries specify non-current data (data effective and/or asserted, in the past or in the future), they must be identical to queries against corresponding non-temporal tables except for specifying that a past or a future date falls within one or both of those date ranges.
- If these queries specify temporal joins, view tables should be used to encapsulate any such joins that otherwise might be difficult for end users to write.
- To support all existing queries without modification a current-only view table must exist corresponding to every asserted version table.
2. Transaction encapsulation. Updates to these tables must utilize insert, update and delete transactions that are simple enough to be written by anyone who could write updates against corresponding non-temporal tables. Specifically:
- If these updates specify current data, they must be identical to updates against corresponding non-temporal tables.
- If these updates specify non-current data, they must be identical to updates against the corresponding non-temporal tables except for specifying the past or future effective and/or assertion dates to be used for the version about to be created.
3. Design encapsulation. Query encapsulation protects the query writer from the complexities of bi-temporality. Transaction encapsulation protects those who write and manage inserts, updates and deletes from those complexities. But what protects the database designer? What protects the data modeler, specifically, the logical data modeler? To add a bi-temporal table to a database, or to change a non-temporal table to a bi-temporal one, what work must these IT professionals do?
There are several considerations. From the point of view of the more complex case, that of changing a non-temporal table to an asserted version table, those considerations include the following:
- Adding the columns required to make bi-temporality work.
- Modifying the primary key so the database management system (DBMS) can enforce temporal entity integrity (TEI).
- Changing all foreign keys to the newly bi-temporal tables to temporal foreign keys (TFKs).
- Declaring the temporal correlates of restrict, cascade or set null options.
- Enforcing parent-side and child-side temporal referential integrity (TRI) constraints up and down the referential integrity chain for the newly bi-temporal tables.
Specifying structural changes and constraint options will be the job of the logical modeler. Implementing those changes and constraints will be the job of the physical modeler and the implementation team.
Fortunately, a lot of this work can be automated, although I don't yet know the full extent to which automation is possible. However, I am working on it as I write. I am developing the aforementioned asserted version framework, and I am initially focusing on automating (TEI) and (TRI) constraints.
A description of this evolving work will be provided as this series continues. My objective, in this initial phase of the framework, is to eliminate the need to write any code to apply inserts, updates and deletes to bi-temporal tables, and also to enforce temporal integrity constraints on those transactions.
An enterprise data architecture is essential to maximizing the benefits and minimizing the costs of acquiring, maintaining, using and ultimately retiring data. In Part 1 of this series, after describing a taxonomy of temporal data management methods, I mentioned the importance of situating any implementation of bi-temporality in the context of an enterprise data architecture. I now think that it is important to describe in greater detail how our asserted versioning implementation of bi-temporality will fit into the context of such an architecture, and in particular how such an architecture will incorporate our asserted version framework.
For example, bi-temporality does not make logfiles obsolete, as Dr. Snodgrass pointed out in his book. I may be in disagreement with Dr. Snodgrass, however, in that we believe that bi-temporality, or at least our own implementation of it, does make physical separate history tables obsolete, or at least less frequently appropriate. Beyond that, I also believe that bi-temporality does not eliminate the need for an historical data warehouse, nor for data marts that contain transactions going back several years. And with these other ways of managing non-current data remaining in play, the kind of non-current data and the volume of non-current data that should be managed bi-temporally is much less difficult to support than might otherwise be thought another reason that the performance implications of bi-temporality are not as severe as many data management professionals fear. These are issues I will be discussing when I look more closely at the intersection of temporal data management methods and enterprise data architecture.
Versioning is essential to providing seamless access because only by means of versioning can we write queries and updates that do not require their authors to specify (and thus to know about and perhaps to join across) different databases or different tables than the ones they would specify to access current data only. It is also essential to providing response times for query results sets containing a mixture of past, current and/or future states that is nearly as good as response times for result sets against non-temporal tables.
As far as response time for updates is concerned, real-time transactional updates should also be almost as fast as transactional updates against non-temporal tables. In particular, the fear of turning a single-row update into a response-time-degrading cascade of integrity checks and possible additional updates, a fear that I once shared with many other data management professionals, is one which I now believe is ungrounded. And as for batch updates, the performance impact should also be negligible. In particular, there should be no issues of one source transaction not being semantically complete until physical updates are applied in multiple databases or to multiple tables.
Bi-temporality is the hallmark of the approach to versioning common to the two disputing parties within the SQL standards groups. One of those parties is Dr. Richard Snodgrass and most of the computer science community. The other party includes C. J. Date, Hugh Darwen and Dr. Nikos Lorentzos (although Dr. Lorentzos has also co-authored articles with Dr. Snodgrass). Bi-temporality, as understood by both these parties, distinguishes two pairs of dates. One is a pair of dates that define an "effective time period" for the row they belong to. The other is a pair of dates consisting of the transaction's physical insert date and a "temporal delete" date.
My interpretation of bi-temporality, however, is somewhat different. I understand effective dates as they do, but for me the other pair of dates are not dates of physical database activity. Instead, they are dates that delimit a period of time during which we assert that a specific version of a specific object is correct. This is an important difference between us and them, as the rest of this series will make clear. As I will show, there are important business requirements that can best be met by specifying assertion periods that may or may not begin with the date the row was inserted, and thus that cannot be met by dates which describe only physical database activity.
Asserted versioning is object-based because deep within the mental model underlying our approach is the concept of these bi-temporally managed rows as representing timeslices of objects (or of relationships among objects) which persist and are subject to change over time. This concept is not absent from Snodgrass' work, but it plays a much more central role in our work than it does in his.
For one thing, it has led us to the concept of recurring episodes of objects. This concept is central to asserted versioning, but is at best peripheral to, and arguably absent from, Snodgrass' recommended implementations of bi-temporality with current SQL and current DBMSs. As for Date, Darwen and Lorentzos, they are not concerned with describing an implementation that we IT professionals can use right now, with today's technology, and so this comparison, in their case, is moot.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access