Where We're Going: Roadmap

 

The major, but by no means exclusive, focus of future columns in this series will be on what I have been calling the "ultimate versioning pattern" but from now on will be calling the "asserted version pattern," or simply "asserted versioning."

 

New roadmap topic one: Non-temporal and uni-temporal tables. None of the versioning patterns we have considered so far have been bi-temporal, so we may as well call them "uni-temporal" patterns. But before we proceed to asserted versioning, which is a true bi-temporal pattern, there is one more thing we can learn from uni-temporal versioning. By comparing maintenance to non-temporal and uni-temporal tables, side by side, we can see how fundamentally alike they really are. There are strong parallels between them and between how they respond to maintenance transactions.

 

Because of the importance of these parallels, I will present, in my columns, a side-by-side series of inserts, updates and deletes against non-temporal tables and against corresponding uni-versioned tables. The reason these parallels are so important is that instead of being merely similarities, they are actually constraints on the semantics expressed by both uni-temporal and bi-temporal versioned tables. A second way that these parallels are important is that they constitute a mental model of versioned history, a model that will also apply to bi-temporal versioning. This mental model will prove to be helpful in understanding any of the work that has been done on bi-temporal data management, the work of Dr. Snodgrass and of Date, Darwen and Lorentzos, as well as my work with colleague Randall Weis.

 

New roadmap topic two: Origins of asserted versioning, schemas and a data dictionary. I will begin the discussion of asserted versioning, the "ultimate versioning pattern" as I have been calling it, by pointing out its origins. One of those origins is the nearly three decades of work on bi-temporality done in the computer science community. Another origin is the equally venerable record of work done by IT professionals to manage historical data by means of versioning. A third is the work of Weis, who developed and put into production a fully bi-temporal versioning pattern at two large insurance companies. This pattern was approved by IBM as part of an implementation of their Insurance Application Architecture (IAA) framework.

 

I will continue these introductory remarks by extending the sample database of policies and policy holders. These two tables are, respectively, a) a kernel table, one which has no reference implementation (RI) or TRI dependencies on any other table; and b) a dependent table, one which is RI or TRI dependent on another table. What is lacking, however, is an associative (many-to-many) table, and one other kernel table which the associative will relate to the policy holder table. With the addition of these two tables to the running example, I will have a sample database with which we can illustrate bi-temporal data management across both one-to-many and many-to-many temporal dependencies.

 

I will then show the schemas for all four of these tables and describe our extended database by developing a complete data dictionary for it (a dictionary fully supported by our glossary and its evolving ontology). I will define the concept of temporal primary keys and temporal foreign keys as they apply to asserted versions, and the roles of effective and assertion time periods. We will then restate the concepts of temporal entity integrity and temporal referential integrity as they apply specifically to asserted versioning.

 

The introduction will conclude with a discussion of what we call "the fundamental definition" of asserted versioning, which is shown in Figure 1.

 

 

New roadmap topic three: Asserted versioning and its place in an enterprise data architecture. The discussion of this topic will conform to the description given of it in the previous section. I will focus the discussion on the complementary roles of various temporal data management methods in the context of such an architecture. I will also describe how our asserted version framework fits into an enterprise data architecture.

 

New roadmap topic four: Just doing it. The first part of "just doing it" will be to show how maintenance transactions work with asserted versioning. I will show the original transactions that conform to the transaction encapsulation clause of the full encapsulation discussion earlier and the temporal transactions that physically realize them. I will show the before and after states of the sample database for each transaction.

 

The second part of just doing it will be to show how queries work with asserted versioning. This demonstration will also use the sample database and will illustrate the query encapsulation clause.

 

In working on this material, we are finding that it is best to illustrate and talk about these transactions and this sample database in abstraction from their actual realization with a specific DBMS. The illustrations will look something like those of the Time and Time Again article series parts 3 through 9, rather than those in parts 16 through 18.

 

But that is obviously not enough to make good my many claims about asserted versioning. I also need to show it at work in a real database managed by a real DBMS. To do this, I will develop a SQL Server implementation of the sample database, apply the same transactions to it that will be used in the columns, and run the same queries against it that will be used in the columns. Beyond that, my intention is to provide a database at my Web site, that readers can run their own queries against. This implementation will include a working prototype of the asserted version framework, which I will demonstrate by showing how various transactions that violate temporal integrity are rejected by the framework.

 

Perhaps this is the right point to emphasize that I am trying to do more than explain how to write such transactions and such queries. That much is, in fact, sufficient for those who would eventually use a database that contained asserted version tables, because of the full encapsulation provide. But I believe that readers want to understand asserted versioning from the inside, to see how it works "behind the scenes," and also to understand why it works. I believe, in other words, that readers want to acquire the mental model I have talked about - a mental model that needs to be grasped in order to be able to intelligently deploy asserted versioning in the proper roles assigned to it by an enterprise data architecture.

 

New roadmap topic five: Two additional capabilities of asserted versioning. One way of using asserted versioning that has not been mentioned so far is to use future assertion dates. With future assertion dates, we populate asserted version tables with data that we may want to play around with in order to get it "just right."

 

The use of future assertion dates, in other words, creates a "virtual sandbox" that is not physically distinct from the production database. Data in this sandbox can therefore be queried together with production data in the same tables, perhaps to see the effect that those future-asserted rows would have if allowed to become "real" data.

 

Sandbox support is not simply something that can be "tacked on" to the asserted versioning model. It is something which the model itself, without extensions, can automatically support, although our current implementation of a temporal sandbox does make use of some minor extensions to the asserted version schema.

 

A second way of using asserted versioning is in the context of data marts with their fact-dimension data models. All data marts provide as-is representations of their dimensions. But sometimes, the business would like to have seamless access to various as-was representations of those dimensions, or even to a combination of an as-is representation for one dimension with an as-was representation for another one.

 

For example, consider a data mart of sales transactions and a hierarchical salesperson dimension. Each sale, i.e., each row in the fact table, is associated with exactly one salesperson. But suppose that these salespersons are grouped into sales teams, and the sales teams into higher level groupings. When a business person wants a summary of the last five years of sales along the salesperson and sales team hierarchy, which hierarchy does he mean - the sales teams as they are constituted today, or the sales teams as they were constituted at some point in time in the past, or perhaps the sales teams as they will be constituted after next month's reorganization? Seamless access, in this context, would mean the ability to plug in a date on a screen and get sales summaries rolled up according to the sales team structure current at such past, present or future points in time.

 

New roadmap topic six: Implementation tips and tricks. I will show the database objects, such as tables, columns and keys, used in the physical implementation of asserted versioning and also discuss the use of indexing, clustering and partitioning to optimize the performance of asserted version databases. I will also discuss techniques to encourage the optimizer to choose the best access paths to asserted version tables.

 

New roadmap topic seven: Four approaches to bi-temporality. Two of the four approaches I will discuss deal with how to provide bi-temporality with today's technology. The other two deal with how to provide bi-temporality by means of standards-approved extensions to SQL and commercial implementations of those new standards. The four approaches are:

 

  • Asserted versioning;
  • Dr. Snodgrass' description of how to implement bi-temporality in today's databases;
  • Dr. Snodgrass' TSQL2 proposal to the SQL Standards Committees; and
  • Date, Darwen and Lorentzos' alternative to TSQL2.

The first two are the approaches dealing with implementations using today's technology, and the last two with proposals for extensions to SQL and to RDBMS capabilities.

 

My perspective is that of asserted versioning. From that perspective, and starting from a shared belief in the theoretical soundness and practical value of bi-temporality, the point of these comparisons will be to highlight the implementation differences with Dr. Snodgrass' approach to bi-temporality with today's technology and also to emphasize theoretical differences with the future SQL proposals of these two other parties to the discussion. For example, on the practical side, I am in the process of developing a more encapsulated implementation of bi-temporality than Dr. Snodgrass presented in his book. On the theoretical side, I disagree with Date and his associates about the implications of the closed-world assumption (CWA) for bi-temporal data management and about whether or not a Snodgrass-like implementation of bi-temporality violates Codd's Information Principle (properly understood).

           

New roadmap topic: The glossary and its ontology. In developing my implementation of bi-temporality, I have found it necessary to introduce and to carefully define approximately 100 specialized terms. Definitions attempt to say as clearly as possible what the term being defined means. But I have found that for the most part, these definitions are about as useful as, defining "customer" as "someone who buys something from us."

 

In the glossary, I require that all definitions be based on a combination of basic terms and other entries in the glossary. The basic terms constitute what is called a "controlled vocabulary." In a formalized representation of the glossary, called an "ontology," these basic terms become the predicates of a logic-based representation of the glossary definitions.

 

The importance of a glossary whose definitions are formulated this precisely is twofold. As most of us know from experience, definitions from subject matter experts that are based only on the constraint that the experts be careful with their definitions seldom bring a significant degree of clarity to the topic. Almost always, meetings to review a proposed list of definitions are dominated by wordsmithing in which the only criterion for making a change to a proposed definition is that, to the majority of people in the room, the suggested change "feels better" than the original definition.

 

Often, suggested changes result from questions raised about the proposed definition, and in the ensuing discussions it becomes clear that the author of the original definition did not have a clear understanding of the term which he simply failed to express clearly. Instead, his own understanding of the term was imprecise, at best. The absence of a clear definition is often evidenced when different usages of the term are tried out and it is discovered that the assembled group of experts cannot agree on which usages are valid and which are not or cannot agree about a set of borderline cases whether they do or do not "fall under" the term being defined.

 

In place of this flawed level of clarity, I have attempted to create a set of definitions where all parties who study the glossary can agree on what the definition of each term means because all such definitions are based on a controlled vocabulary supplemented by an extensive use of terms defined elsewhere in the glossary. The result is a semantically tightly knit set of terms and definitions that experts may disagree about, but about which experts cannot show that the definitions mean different things.

 

Clarity to human beings interacting with a system that is based on concepts defined in a glossary is one thing. But an even more rigorous degree of clarity is gained when definitions are translated into the axioms of a system of (at least) first-order predicate logic. In this form, there exist inference engines which can reason about the glossary, and via theoremproving, derive new statements whose truth is guaranteed given the truth of the axioms and which, often enough, the human beings dealing with the glossary did not realize were implications of those glossary definitions.

 

Automated inferencing by inference engines using logic-based theorem-proving techniques is what the introduction of ontologies into IT data management is all about. Using that Glossary as an extended example, I will use a few installments in this series to explain what ontologies are and how they can improve our ability to manage data in a manner which maximizes its semantic content and minimizes the cost of managing it.

 

Wrap-up of the series. The most important thing about asserted versioning is that it provides seamless, real-time access to descriptions of objects at different points in their lifetimes. It does so with queries which are simple enough that they can be written by anyone who can write non-temporal queries and with updates that meet that same standard.

 

Approximations to this speed and ease of access to temporal data have been around for as long as databases have been around: And over the years, immediacy and ease of access to temporal data has gradually gotten better. But asserted versioning is a quantum leap beyond these incremental advances. Asserted versioning combines decades of IT experience, which culminated in versioning, with decades of computer science experience, which culminated in bi-temporality.

 

Asserted versioning provides the benefits of real-time seamless access. My asserted version framework provides the means to implement asserted versioning quickly and with little impact on existing application systems or database users. A second advantage of the framework is that it will not become obsolete when bi-temporal SQL standards are finally defined and vendors then begin to implement those standards. On the contrary, beside providing temporal data management benefits now, rather than years from now, the asserted version framework also eliminates a large part of the work that would otherwise be required to begin implementing commercial solutions, once they become available.

 

Next time, I begin with the new roadmap topic number one, a comparison of non-temporal and uni-temporal tables that emphasizes the similarities of changes to them as transactions are applied.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access