This glossary is being constructed as a controlled vocabulary. By that we mean that any expression involving temporal concepts used in the definition of a glossary entry must itself be a glossary entry.
One reason for being this careful with definitions is that this kind of rigor must be applied to definitions before they can be "ontologized," i.e., expressed in a formal notation like first-order logic. That, in turn, is needed to make definitions available for manipulation by software-realized inferencing engines. As of mid-May, 2008, we have begun the process of developing a formal ontology as part of our Glossary.
Another reason for being this careful with definitions is that this kind of rigor must be applied to any set of technical definitions before we can claim not only that we say what we mean, but also that we know that what we say is truly and accurately what we mean.
As this series continues, context becomes increasingly important so that the thread of the discussion is not lost. Please do a search for "Time and Time Again" on www.dmreview.com for a list of previous articles in this series.
In Part 13, we presented a roadmap to the rest of the series. But like many authors, we have found that in presenting our ideas, we have not merely written down what we already knew. In the process, our own understanding of what we are discussing has been deepened, and one consequence of that has been some changes in the roadmap. So in the next three articles, we will summarize what we've done so far and then proceed to develop an amended roadmap of where we go from here.
Where We've Been
In Part 1, we introduced the topic of managing time in relational databases. Using a taxonomy, we arranged and described various ways in which this has been done. One of those ways is by versioning objects, which is usually implemented by adding a date as the low-order part of a primary key. This permits a chronological sequence of states of an object to be kept in a table.
We still believe that this taxonomy is a good classification of the various ways in which temporal data has been managed in real-world databases. First, it is a partitioning of the subject. This means that, at every level, the nodes are mutually exclusive and jointly exhaustive. As for the principal benefit gained by using versioning (our own bi-temporal implementation of it, that is), we now refer to it as "seamless access" to data about the past, current and future states of persistent objects.
In Part 2, we argued that the old association of historical data with analytic databases, and current data with operational databases, is breaking down. Data marts that are expected to contain yesterday's transactions, or even this morning's transactions, are an example of analytic databases that contain current data. Online transaction processing (OLTP) and operational data store (ODS) databases that make use of versioned tables are examples of operational databases that contain historical data.
This is an important distinction to make because there is still a widespread tendency to think of "historical data" as something we don't need in a real-time, transaction-driven environment. We still tend to think of historical data as something that can be physically pulled out of OLTP and ODS databases and put in a data warehouse or in historical data marts. Well, some historical data is like that. For such data, we need neither versioning nor, indeed, bi-temporality. For such data, the work on bi-temporality by Dr. Snodgrass and by C. J. Date and his associates is not relevant.
But we must be clear that not all historical data is like that. Some historical data is just as relevant to real-time ongoing business activity as is current data, and often is needed, together with current data, to provide essential information to decision-makers, or to complete business transactions. This kind of historical data, and this kind of access to it, is the business justification for taking bi-temporality, and especially bi-temporal versioning, seriously.
In Parts 3 through 9, we introduced several "version patterns" that we have seen used by business IT professionals. None of these made use of bi-temporality, but each of them provided some temporal data management functionality. With each pattern, we explained what it could and could not do with respect to managing temporal data.
These precursors of our own bi-temporal versioning pattern are representative examples of the kind of versioning that has been used in business IT environments for many years. One reason for discussing them is to approach bi-temporal versioning one step at a time. First: versioning, something IT professionals are familiar with. Second: bi-temporality, something IT professionals are either not familiar with, or have encountered only as the distinction between a version date and a row insert date. A second reason to discuss these precursor variations on versioning is to have a point of comparison that will enable us to clarify what temporal management functionality our versioning pattern provides that none of these earlier patterns do.
In Parts 10 through 12, we discussed the use of foreign keys with versioned tables. We introduced the notion of an "object foreign key" (now called a "temporal foreign key"), which references versioned tables but does not reference a specific row within those tables. Instead, it references the persistent object that a set of versioned rows are versions of. We also explained why data management professionals are concerned about the performance implications of enforcing referential integrity on versioned tables, and why, in the full production implementations we have developed, we chose to enforce integrity constraints at query time rather than at update time.
This is where the greatest change in our understanding of temporal data management has taken place, as we have written these articles. By developing a form of bi-temporal versioning that makes episodes the core construct, rather than versions themselves, we have been able to enforce all temporal integrity constraints specifically temporal entity and temporal referential integrity constraints right away, when transactions are applied to the database, and not later on, when data is retrieved. We don't wish to say that the material in Parts 10 through 12 is now irrelevant. But more than anything else we have written in this series, those articles describe the early stages of a thought process that led to the more mature consideration of temporal constraints described in Parts 19 through 27.
Part 13 was a roadmap to the rest of this series. Some of the pieces of that roadmap remain intact, but as we said, there will be significant modifications to it. Figure 1 contains the list of eleven topics described in Part 13, with a comment on their status.
Parts 14 and 15 explained why we prefer to make extensive use of surrogate keys in our "ultimate" versioning pattern. Strictly speaking, the use of surrogate keys is orthogonal to versioning; either can be implemented and can provide value without the other. But the extensive use we will be making of such keys, and in particular the global uniqueness property which our special kind of surrogate key possesses, eliminates what has been called the "object/relational impedance mismatch" and enables a high degree of code re-use through generalization.
In Parts 16 to 18, we presented a SQL Server implementation of the first three of these patterns. Although our focus in this series is on concepts, and not on implementation, we thought it important to show what an implementation would look like. Later, we will do the same for our ultimate versioning pattern, thus showing that our pattern in fact provides all the functionality claimed for it, encapsulates complexities as thoroughly as we have claimed, and works without error to maintain and to return data that falls within the temporal boundaries specified, while retaining data that falls outside those temporal boundaries so that a full chronology of persistent objects is retained.
Parts 19 through 27 were motivated by a re-examination of our original decision to not enforce integrity constraints on versioned tables as they were being updated, but instead to enforce those constraints as those tables were being queried. This evolved into a discussion of temporal entity integrity and temporal referential integrity, and how they are enforced on inserts, updates, deletes and upserts. It also included a resolution of the issue of how to use pairs of dates to represent periods of time, and our decision to use the "closed-open" method of representation.
In short, we have changed our minds. We now believe that temporal integrity constraints can be applied to transactions, as they update the database, and that our previous concerns about performance implications were not justified. We have therefore set out to build a framework to automate all temporal integrity checking, one which we will call the "asserted version framework."
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access