Note: Beginning with this column, it will be increasingly important for readers to refer to our glossary, which can be found on the author’s Web sites (MindfulData.com and InbaseInc.com) with a listing of previous columns and articles. Our glossary-defined terms form a closely knit conceptual network, and none of these terms can be fully understood apart from their semantic connections with other terms in the Glossary. And a note on the use of italics: we use italics for emphasis, but we also italicize the initial occurrence, from this column forward, of glossary-defined terms.

 

An original transaction is an insert, update or delete, whose target is an asserted version bi-temporal table. It is the transaction as written by its author and as submitted by the database administrator (DBA) to the database management system (DBMS). A temporal transaction is a physical transaction against an asserted version table. These physical transactions are either physical updates (updates in place) or physical inserts; there are no temporal transactions that are physical deletes.

 

Original transactions are never applied directly to the database. Instead, they are mapped into one or (usually) several temporal transactions. Temporal transactions are never written by an author - an end user, programmer or DBA – instead, they are always created by a mapping from an original transaction.

 

Transaction 1: An Open Current Insert

 

Figure 1 shows the mapping for an original insert. We assume, for now, that the original transaction is an open current transaction. An open current transaction is an open transaction, because it does not specify an effective period end date for the version being created. It is a current transaction, because it accepts the current date as the effective period begin date for the version being created.

 

 

An insert into a non-temporal table is valid just in case a row for the object does not exist in the target table at the time of the insert. In the same way, an insert into an asserted version table is valid just in case a current episode for the object does not exist in the target table at the time of the insert. Thus, an open current insert is valid if the target table is empty, and it can be valid even if the target table contains past or future episodes of the same object. In the non-temporal case, this constraint is known as "entity integrity." In the bi-temporal case, it is what we call temporal entity integrity (TEI).

 

 

Before the transaction is applied, the target table is as shown in Figure 2. The box containing the timeline stretching from 1/01/04 through 12/31/09 also contains two horizontal bars above the timeline. The upper shaded bar will contain a graphic representing rows that are no longer currently asserted. It will come into play when we begin to discuss transactions, which correct data already in the database. The lower unshaded bar will contain a graphic representing rows that are currently asserted.

 

The time current when the transaction is applied is shown in the box containing the word "Time." However, throughout our discussions, we will use dates for all bi-temporal points in time. This eliminates unnecessary details in the presentation of asserted versioning, but in no way distorts the semantics.

 

The original transaction itself is shown in the box immediately to its right. Finally, the column headings of the target table are shown, with primary key columns underlined, and the temporal foreign key (TFK) column italicized. This list of column headings is preceded by a row number column. It is not part of the table, but is simply a concise way to label the rows so we can discuss them without getting them confused with one another.

 

 

Figure 3 shows the result of applying transaction 1 to the target table. The table is an asserted version policy table, and each row in the table represents one version of one policy. The policy is the object, and each row in the table is one assertion of one version of the object.

 

Transaction 1 has created one row in this table. This row is the initial version of a new episode of the policy. Because by assumption, the table was empty prior to this transaction, it is also the initial episode of the policy.

 

The unique identifier of the policy is its object identifier (oid), P861. This version is effective beginning on 1/01/04, because it was applied on 1/1/04 and no override effective begin date was specified with the original transaction. It will remain in effect until further notice, because no override effective end date was specified. The effective begin date is the second component of the primary key. (To save space, we use "9999" instead of "12/31/9999" in these figures.)

 

The last component of the primary key is the assertion begin date. Because no override assertion begin date was specified with the original transaction, the current date is used. In other words, the default is for a version to be asserted as soon as it is created. The assertion end date is set to 12/31/9999, meaning that we will continue to assert what this row represents until further notice.

 

A valid original insert transaction creates a new episode. There could be earlier episodes of an object already represented in the asserted version table; but for an insert to be valid the most recent prior episode must have a latest version with an assertion end date at least one clock tick old. By the same token, there could be future episodes of the object already represented in the table; but for the insert to be valid, the earliest of these episodes must have an earliest version with an assertion begin date at least one clock tick later than the assertion end date specified on the transaction. In this case, because our transaction defaults to a 12/31/9999 assertion end date, there can be no future episodes of the object at all because no clock tick, by definition, is later than 12/31/9999.

 

An episode begin date is always set to the assertion begin date of its earliest version. So in this case, because our transaction creates the initial version of a new episode, the episode begin date is 1/01/04.

 

In these diagrams, we will always use the two-letter string "dt" to stand for the transaction date (i.e., the date on which the row was physically created). This will always be the same date as that shown in the "Time" box. We use "dt" as a stand-in for an actual date in order to fit our column headings across the width of a page.

 

The green and red arrows in Figure 3 represent the effective and assertion time periods, respectively, of versions and episodes. More will be said about them next time.

 

The client column is what we call a TFK. In this case, the policy has a TFK to client C882. The DBMS cannot manage this special kind of foreign key, nor can it enforce temporal referential integrity (TRI) on it. TFKs and TRI are especially difficult concepts, so we will postpone their discussion until we reach scenarios specially designed to illustrate them.

 

Deferred Assertions

 

We normally think of the act of adding a row to a table as being the same thing as the act of asserting the state of affairs that the row represents (i.e. of claiming that what the row describes is in fact true). From this point of view, our distinction between the physical act of creating a new row in a table and the semantic act of asserting what the row represents is a distinction without a difference.

 

This is why, we believe, the computer science community called our two assertion dates "transaction dates." It just didn't occur to them that there was a difference. Yet, a transaction is a physical act and an assertion is a semantic one, so there is a difference.

 

With the asserted versioning approach, the default is for original transactions to begin to be asserted on the date they are applied to the database. But unlike the computer science model of bi-temporality, asserted versioning also permits transactions to be submitted prior to the date on which they are to be applied.

 

Normally, when we apply any transaction, bi-temporal or not, to a database, we also claim that the results of that transaction represent what is true. By that very action, we begin to assert that the resulting row makes a true claim. But asserted versioning permits us to populate its bi-temporal tables with rows that represent claims we are not yet willing to assert.

 

We have identified three uses for such deferred assertions, as we will call them. They are:

 

  • To deploy and manage internalized batch transaction files in the target table,
  • To deploy and manage internalized staging areas in the target table and
  • To deploy and manage internalized hypothetical states in the target table.

Because our implementation of asserted versioning relies on current DBMS technology, we have to take a "do it yourself" approach to these three internalizations.

 

Internalized Batch Transaction Files

 

Some transactions are meant to be applied to the database as soon as they are created. We typically call them real-time transactions. Other transactions are ones that we accumulate in an externally managed queue. We typically call this queue a batch transaction file and its content batch transactions.

 

DBAs, machine operators and other IT professionals spend a significant amount of time managing batch transaction files - populating them, applying them, clearing them out, archiving them and so forth. Deferred assertions eliminate this work. Transactions written with a specified future assertion begin date can be immediately submitted to the DBMS, thus internalizing batch transaction files and the work of managing them. This internalization supports a "submit it and forget it" semantic for transaction management. We previously referred to these internalized batch transaction files as virtual batch transaction files, because the internalization leaves us with no physical file to manage.

 

Future releases of commercial DBMSs might support this function with physical batch transaction files that they manage internally. Our asserted versioning support for this function, however, does not take this approach. Instead, we include these future assertions within their target tables. We will show how this works later on in this series.

 

Internalized Staging Areas

 

IT professionals frequently extract and load a selection of data into a staging table, and then manipulate that data in various ways. Once a desirable result is obtained, the data is loaded back into its source table. But if staging areas can be brought into the very tables that are both their sources and destinations, a great deal of manual effort can be eliminated. We will show how this works later in the series.

 

Internalized Hypothetical States

 

Businesses also require frequent access to hypothetical states of selected data, states that support what we typically call "what if" analysis. To support such analysis, programmers and DBAs are usually called on to extract and load a selection of data into a separate table or even a separate database. Often, after transforming that data into a hypothetical derivative set of data, business users then want to examine that data, and frequently would like to use existing queries to do so. But those queries must be rewritten before they can be used, because table names and column names are often different.

 

Thus, the processes of both managing and accessing hypothetical data is time-consuming and error-prone. But if a collection of hypothetical data can be managed and accessed within the tables from which they are derived, a great deal of manual effort can be eliminated. Existing queries can be used in an almost unmodified state. We will show how this works later in the series.

 

Deferred assertions are a way of moving a very significant amount of the tedious data management work done in IT departments into the DBMS. This allows us to declaratively specify to the DBMS what we want done without having to go to the trouble of doing it ourselves.

 

Next time, we will continue Scenario 1 by updating policy P861.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access