Note: A glossary of technical terms used in these columns can be found on the Web sites of the authors. In addition, a listing of earlier articles and/or columns in this series can also be found on DMReview.com, MindfulData.com and InbaseInc.com.

 

Figure 1 is the taxonomy of temporal data management methods, which we began discussing last time. However, the section under the node "Integrated Time" has changed. We will discuss the change when we get to that part of the taxonomy.

 

The principal reason for discussing this taxonomy is to distinguish the various things we may mean when we talk about managing time in databases. Each method of time management has its place in an enterprise data architecture (EDA). But our specific focus, in our "Time and Time Again" series and in this column has been what Figure 1 shows as the right-hand node of the lowest branch, the one labeled "tri-temporal time." This is the method we have been calling "asserted versioning."

 

In our last column, we explained the distinction between reconstructable time and queryable time, and between the two kinds of queryable time - event time, which is recorded as a series of transactions against an original state of some object; and state time, which is recorded as a series of states, each new one resulting from an insert, update or delete transaction.

 

We continue by explaining the distinction between the two kinds of state time - snapshot time and version time.

 

 

Snapshot Time

 

Within the computer science community, the term "snapshot" refers to a non-temporal state table, one whose rows are physically updated as changes are submitted to it. We have referred to such tables as "non-temporal" and will continue to do so. But as we and most IT professionals use the term "snapshot," it refers to periodic copies of an entire database or at least of a number of logically related tables in a database. We will continue to use the term in this way.

 

The first point about snapshots is that, when they are taken, they copy entire tables or even entire databases. This is a very straightforward way of keeping track of history, at least compared to the complexities of bi- and tri-temporal data management. It is also quite wasteful, because over a typical time span of days, weeks or months, only a small number of the rows in most tables will be updated. For those rows that are not updated, snapshots are simply duplicate copies.

 

A second point about snapshots is that their results are kept online. This distinguishes them from the backups, archives and transaction logs, which implement what our taxonomy calls "reconstructable history."

 

The online results of snapshots may be stored in the same database as the tables they are copies of or in a different database. Typically, the first option is referred to as keeping a set of "snapshot tables," and the second is referred to as updating a data warehouse.

 

A final point about snapshots is that they may also miss changes. A row that is inserted after one snapshot and deleted prior to the next one will not show up in that next snapshot. If there are two or more updates made to the same column of a row between snapshots, only the last one will show up in the snapshot.

 

There is only one thing that snapshots can reliably tell us. They can tell us exactly what the copied tables or databases contained when the copy was made. This makes snapshots good for two things, and two things only:

 

  1. If the snapshots are taken at points in time important to the business, such as at month-end, then those snapshots are available to rerun reports and produce additional "one off" reports of the state of the business as of that point in time.
  2. A series of period-end snapshots of a database can be used to extract trend data in a process called data mining. For example, monthly snapshots of a customer master table can be distilled into monthly profiles of the "typical customer". From such profiles, a company could track changes in age, income or other parameters of its customer base.

Version Time

 

Version time is the logical complement of snapshot time. Unlike snapshots, versions are not wasteful. Both snapshots and versions make copies of data, but versioning copies only the data that has changed. In addition, unlike snapshots, versions are reliable. They record all changes to the versioned data, not just changes that have lasted long enough to make it to the next snapshot point.

 

A non-temporal table becomes a version table when updates that overwrite data are replaced by copying the row to be updated, applying the update to that copy and then inserting it as a new row. Typically, a date or timestamp is used to distinguish one version of the same object from another and to sequence those versions in time.

 

We continue by explaining the distinction between the two kinds of version time - separate and integrated.

 

Separate (Version) Time

 

One way to implement versioning is to keep all noncurrent versions in a separate table. The original table then retains its status as a non-temporal table, one whose rows always reflect the current state of the objects they represent. All noncurrent versions are then kept in a separate table, usually in the same database (although often in a separate database partition). This separate table is often referred to as a "history table," or just simply as a "version table."

 

One motive for keeping versions separate from current data is to avoid negatively impacting performance for queries that want current data only. Another is to avoid making those queries more complex. If versions were kept in the same table as current data, then every SELECT statement that wanted current data would have to specify that it wanted the version whose versioning date (in the primary key) was the latest date among all versions whose primary keys were identical except for that date.

 

Integrated (Version) Time

 

The other way to implement versioning is to keep all versions, including the current version, in the same table. This transforms the original non-temporal table into a temporal table. When this is done by adding a single date to the primary key, as is typically the case, the result is to transform the non-temporal table into a uni-temporal table, one with only one temporal dimension.

 

One motive for using integrated rather than separate version time is simplicity. With integrated versioning, there is only one table to query. So to look for different point-in-time states of the same objects requires altering only the date used in the WHERE clause. To retrieve both current and noncurrent versions of the same object does not require a join across two tables, a join that would be required if separate versioning were used. A single table also provides a single unit of work for IT operations - a single object to backup and archive, a single object to restore when needed.

 

Another motive for using integrated rather than separate time is an additional level of simplicity when we move to bi-temporal and tri-temporal tables. As we will see, the queries and maintenance transactions whose targets are bi- or tri-temporal tables would be immensely more complicated if their targets were not single tables.

 

Uni-Temporal, Bi-Temporal and Tri-Temporal Tables

 

A non-temporal table is one whose rows reflect the current state of the object they represent. Figure 2 illustrates a non-temporal client table.

 

 

Each row of the table in Figure 2 represents the current state of one client. When that state changes, an update transaction changes the row to reflect the change. This is an update in place. The old values are lost, because they have been overwritten by the new ones. Txn-date is the date the row was inserted, and it is usually included on non-temporal tables. Sometimes, these tables also include a last-update-date.

 

 

Figure 3 illustrates a uni-temporal client table. Each row of this table represents, for a specific client, the state of that client that became effective on the effective-start-date. When that state is no longer effective, the row is updated by putting the date on which effectivity ceased in the effective-end date. When a new state of the client becomes effective, a new row is inserted, with a new effective-start-date. In each case, txn-date is the date the row was inserted. (A variation on uni-temporal tables is to leave out the effective end-date.)

 

 

Figure 4 illustrates a bi-temporal client table. Each row of this table represents, for a specific client, the state of that client that became effective on the valid-start-date and that was added to the table on the txn-start-date. When that state is no longer effective, the row is updated by putting that date in valid-end date. When a new state of the client becomes effective, a new row is inserted with a new valid-start-date and txn-start-date. Notice that, so far, nothing is done to txn-end date. We may assume that it is {null} when the row is inserted and that it remains null when the effectivity period for the row ends. Also, note the change from "effective" to "valid." This is a terminological change only. We have made it on the recommendation ofDr. Richard Snodgrass, who pointed out that the term "valid time" is used in the computer science community and that the term "effective time" is never used in place of it.

 

Occasionally, erroneous data does make its way into our tables. If a row in a bi-temporal table is in error, we would lose information if we physically deleted it. But if we don't physically delete it, how can we add another row with the same client-id and valid-start-date?

 

The answer is to add the correcting row with the same client-id and valid-start-date, but with a new txn-start-date, the date on which this new row was physically inserted. The row in error is updated by placing the same new date in its txn-end date. Because every pair of adjacent rows with the same client-id and valid-start-date will be such that the earlier one has a txn-end-date equal to the later one's txn-start-date, the column txn-end-date is not logically required. It is not a denormalization, but it is redundant data.

 

 

Figure 5 illustrates a tri-temporal client table. (Because tri-temporal tables are not discussed by computer scientists, we now revert to our original terminology.) Each row of this table represents the state of one client that became effective on the effective-start-date, and that was first asserted to be true on the assertion-start date. Normally, it is the very action of inserting a row into a table that corresponds to asserting that it is true. But one key feature of tri-temporality that distinguishes it from both the versioning tables created by IT professionals and the bi-temporal tables described by computer scientists, is that by using tri-temporal tables we may enter rows into those tables with a future assertion-start-date.

 

Erroneous data in tri-temporal tables is handled the same way it is in bi-temporal tables, except that the distinguishing date is now an assertion-start-date rather than a txn-start-date. We could enter a correction into a tri-temporal that would not take effect immediately by making its assertion-start-date a future date. But while there seems little reason to post-date corrections, there are legitimate and important reasons to post-date assertions.

 

Wrap Up

 

Based on recent conversations with Dr. Snodgrass, who is the leading computer scientist in the field of temporal databases and the one who submitted the TSQL2 extensions to the SQL standards committee, we have decided to change our taxonomy and terminology. Throughout our series of nearly 30 articles and this column as well, we have described our own approach to temporal data management as a bi-temporal one. It is bi-temporal because rows for the same object are uniquely identified by providing two temporal coordinates - one in effective time space and one in assertion time space. The bi-temporality discussed by computer scientists also uniquely identifies rows for the same object by providing two temporal coordinates - one in valid time space and one in transaction time space.

 

With respect to the first dimension, the distinction is only terminological. What we call "effective time" is called "valid time" in the computer science community. But with respect to the second dimension, the distinction is more than terminological. With transaction time, the system determines the value for the start-date, and it is always the date on which the row was inserted. With assertion time, the user determines the value for start-date, and it can be any date subject to the constraint that it cannot be in the past.

 

So our temporal data management approach, which we have called "asserted versioning," has been called a bi-temporal approach up to now. Following Dr. Snodgrass' recommendation, we will begin calling it a tri-temporal approach, in spite of the fact that rows for the same object are still uniquely identified by their position in only two temporal dimensions.

 

This concludes our discussion of our temporal data management taxonomy. Following the roadmap laid out over the last couple of months, we will continue by comparing how inserts, updates and deletes affect non-temporal and uni-temporal tables.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access