This installment should have been the continuation of our presentation of physical implementations of the version patterns discussed thus far. It should have presented a physical implementation of Version Patterns 4 and 5. However, because editors are just plain tough when it comes to deadlines, and because of some personal exigencies with the authors, we are unable to stick to this logical sequence. Instead, in this article, we will consolidate and extend some of the things we have learned about the semantic constraints that apply to versions, and especially to what referential integrity means when applied to versions.

 

In terms of the roadmap presented in Part 13, this means that we are putting Step 2 aside, for the moment, and moving on to Step 3, the semantics and implementation of Version Pattern 6. The first part of those semantics which we wish to discuss is what we will call "temporal referential integrity" (temporal RI).

 

A glossary is included at the end of this article. As we have been writing this series, we have developed a set of terms which have been defined more by means of introducing and using them than in any more formal way. Of course, the later Wittgenstein (someone whose name C. J. Date likes to invoke) did indeed say "Meaning is use," but it is well-known (by philosophers, at least) that the later Wittgenstein is notoriously difficult to interpret and, consequently, correspondingly inappropriate as a source of quotations taken out of context. This glossary, like dictionaries when they are periodically undergoing revisions, takes usage to be prescriptive of meaning, but not identical to it. And in so doing, it and we part company with the later Wittgenstein for whom most philosophical problems need to be dissolved, not solved.

 

RI and Temporal RI

 

The referential integrity (RI) that we all know and love is RI that the database management system (DBMS) can enforce, based on declarations we make to it when we submit our data manipulation language (DML). It is RI against nonversion tables. For clarity's sake, we have been calling non-version tables "object tables," not in the object-oriented sense, but rather in the sense of things that persist through time, things that versions are versions of. Thus, a Customer table is a table of customers, which are instances of a certain kind of object, and which persist through time. But a Customer Version table is not a table of customers; it is a table of chronologically sequenced, nonoverlapping, time slices of customers.

 

As we continue, we will use the term "object RI" when we think the adjective is needed for clarity, and simply "RI" when context makes it clear that we are talking about object RI. Finally, we will always use the term "temporal RI" when talking about RI as it applies to versions.

 

Object RI

 

Here is the definition of object RI:

 

If there is an object RI dependency from table Y to table X (not necessarily distinct), then no state of the database is valid in which any row in table Y is not foreign key-linked to a row in table X. No inserts, updates or deletes are valid that would transform the database into such a state.

 

The RI machinery of the DBMS insures this by carrying out the following checks. We assume throughout this discussion, unless otherwise noted, that all foreign keys are defined as non-nullable.

  1. When inserting a row into object table Y, that row must reference a row in object table X that already exists in the database (or that is being inserted as part of the same atomic transaction).

  2. When updating the foreign key to a row in object table X, from a row in object table Y, that new foreign key must reference a row in X that already exists in the database (or that is being inserted as part of the same atomic transaction).

  3. When deleting a row in object table X, no rows may remain in object table Y, after that transaction is complete, that reference that row.

With insert, update and delete actions, and two tables, we have a combination of six table/action combinations. Notice that three of them are not mentioned as constraints that object RI must enforce. Inserting or updating a parent row cannot affect RI, nor can deleting a child row or updating it in any way that leaves its foreign key to its parent unchanged.

 

Note that for the time being, we are not concerned with the different ways that the DBMS can enforce constraint number 3 above – those ways being a) block, b) set null and c) cascade delete.

 

Temporal RI

 

We can now define and illustrate the enforcement of temporal RI in a way that emphasizes that object and temporal RI are both forms of referential integrity, but with the latter bringing time into consideration. But before we do, here is a depiction of what Version Pattern 6 looks like, using our Client Version table as an example.

 

 

This schema will be refined later on, as we begin our discussion of Version Pattern 6. It is presented here simply to provide a graphic image of the various time-implementing columns that are discussed below.

 

So here is the definition of temporal RI, stated initially as a constraint involving two version tables As we will see later, we must also consider temporal RI constraints, and their enforcement, between two tables one of which is a version table and the other of which is an object table:

 

If there is a temporal RI dependency from version table Y to version table X (not necessarily distinct), then no state of the database is valid in which any row in table Y is not object R- linked to a row in table X, or in which the effectivity time period of a row in table Y is not wholly contained within the effectivity time period of its parent row in table X. No inserts, updates or deletes are valid that would transform a database into such a state.

 

Now for the enforcement of temporal RI. No existing DBMS provides the machinery to carry out this enforcement. Instead, it will be up to us, the developers, to support temporal RI by writing our own code, possibly using triggers, to enforce the following constraints on activity against the database:

 

  1. When inserting a row into version table Y, that row must reference a row in version table X that a) already exists in the database (or that is being inserted as part of the same atomic transaction), and b) whose effectivity time period wholly contains the effectivity time period of the row being inserted into Y.

  2. When updating the foreign key to a row in version table X, from a row in version table Y, that new foreign key must reference a row in X that a) already exists in the database (or that is being inserted as part of the same atomic transaction), and b) whose effectivity time period wholly contains the effectivity time period of the row being updated in Y.

  3. When deleting a row in version table X, no rows may remain in version table Y, after that transaction is complete, that reference that row.

  4. No update to a row in version table X (where a change in primary key is not considered an update) has any effect on RI constraints, other than a change to that row's effectivity time period. In that case, when updating the effectivity time period of an object X (which is done by creating a new current version of X that specifies the new time period), that effectivity time period may not be shortened as long as one or more object-RI linked rows in version table Y have an effectivity end date beyond that of the shortened end date for object X.

Recall, from our discussions thus far, that deletions in version tables are always logical deletions. That is, they are always done by marking the current version of an object deleted as of an indicated date, not by physically removing a row from a table.

 

Informally, what is going on is this:

 

Temporal RI is object RI plus the constraint that the time period of a child row must always be wholly contained with the time period of its parent row.

 

If we keep this in mind as we begin to look more closely at the details of implementing temporal RI enforcement, those details will be much less likely to confuse us. All of those details exist to support this one-sentence definition of temporal RI.

 

Glossary

 

Child table, child row. Y is a child table to X if and only if there is a foreign key dependency from Y to X. A row in Y is a child to a row in X if and only if the row in Y has a foreign key whose value is identical to the primary key value of that related row in X.

 

  1. Parent/child relationships typically have a maximum cardinality of one/many, and a minimum cardinality of optional for the parent and required for the child. But it is a matter of which table contains the foreign key and which table is the reference of that foreign key that differentiates parent from child tables and rows. Cardinality constraints are not what make the difference.

Effectivity time period. The period of time during which a version is regarded as the truth about an object.

 

  1. In our version patterns, an effectivity time period is defined by an effectivity begin date and an effectivity end date, where "date" may be a calendar date or any other "tick of the clock" (as described in Part 2).

  2. Our convention is that the time period begins on the begin date, but ends one clock tick prior to the end date.

Object referential integrity, object RI. Referential integrity for objects, as opposed to object versions.

 

  1. This is what the term "referential integrity" refers to, outside of these articles. It is the integrity constraint, enforced by all current RDBMSs, that every non-null foreign key must contain a value identical to the value in the primary key of a row in the RI-related table.

Object table. A table whose rows represent persistent objects. Sometimes called a "non-versioned" table.

 

  1. Persistent objects are things that exist over time and can change over time, such as vendors, customers, employees, regulatory agencies, products, services, bills of material, invoices, purchase orders, claims, certifications, etc.

  2. From an OLAP, star-schema point of view, dimension tables are tables of persistent objects. For a more in-depth discussion of different types of tables, see the articles "An Ontology of Tables," at MindfulData.com.

  3. From an OLTP database point of view, assuming that there are no version tables in the database, object tables are all the tables which are not transaction tables.

  4. Roughly speaking, object tables are the tables which are the concern of master data management.

Parent table, parent row. X is a parent table to Y if and only if there is a foreign key dependency from Y to X. A row in X is a parent to a row in Y if and only if that row in Y has a foreign key whose value is identical to the primary key value of that row in X.

 

  1. Parent/child relationships typically have a maximum cardinality of one/many, and a minimum cardinality of optional for the parent and required for the child. But it is a matter of which table contains the foreign key and which table is the reference of that foreign key that differentiates parent from child tables and rows. Cardinality constraints are not what make the difference.

Temporal referential integrity, temporal RI. Referential integrity for versions, which consists of object RI plus the constraint that the effectivity time period for child rows are wholly contained with the effectivity time periods of their object RI-related parent rows.

 

Version table. A table whose rows represent versions of persistent objects.

 

  1. A version of a persistent object is a time-slice of that object. A row in a version table represents a version, and describes that object as it was, is, will be and/or might be during its effectivity time period.

  2. A row in a version table is what the custodians of the table believe is the truth about the object it represents, during the indicated effectivity time period.

Wholly contained in (for effectivity time periods). Time period 2 is wholly contained in time period 1 if and only if the effectivity begin date of time period 2 is equal to or later than the effectivity begin date of time period 1, and the effectivity end date of time period 2 is equal to or earlier than the effectivity end date of time period 1.

 

  1. In order to implement this constraint in today's production databases, we rely on SQL data types and operators. Specifically, we rely on dates and/or timestamps, and the DBMS implementation of comparison operators for those data types.

  2. Also for ease of current implementation, we use "12/31/9999" in a way whose intracacies will be discussed in Part 20. For now, we note that when we use "12/31/9999," we let the DBMS interpret that string as a valid date. However, semantically, it almost never would be a valid date, because no business activities that we know of are concerned with an effectivity time period that extends up to but not including or beyond that date nearly eight-thousand years in the future.

Wrap-Up

 

In Parts 10 through 12 of this series, we discussed RI as it applies to version tables. There we concluded that when managing rows in version tables, the only RI we needed to enforce among those rows was object RI, RI expressed by means of object identifiers (OIDs).

 

Clearly, if this is all that were done to check the validity of insert, update and delete transactions against version tables, it would be possible to introduce violations of temporal RI. As we will explain next time, our concern was to avoid the performance penalty of enforcing temporal RI on inserts, updates and deletes to the database. This concern, indeed, is the principal reason that IT professionals tend to avoid versioning as much as possible.

 

In Part 20, we will examine our reasons for taking that position, and will explain why it is still a reasonable position to take. However, we will look more closely at the performance implications of doing temporal RI checking on insert, update and delete transactions. Those implications may not be as forbidding as we had originally thought them to be.

 

We will also begin a discussion of the value "12/31/9999" which may appear in effectivity end dates. In date or timestamp data types, this string is understood by the DBMS to represent an actual date, nearly eight-thousand years in the future. But from a semantic point of view, that string does not represent a date. It functions as a temporal {null}.

 

As such, we shall argue, its most mathematically secure management would require the use of a temporal three-valued logic. But fortunately, our focus is on implementation with today's technology. So just as {null}s in today's SQL fudge a robust three-truth-valued logic, we will show how these "temporal {null}s" can also limp along without the support of a secure foundation in a three-time-valued (modal) logic.

 

Beyond these issues, we must also discuss temporal RI as it applies to future versions, to hypothetical versions, and also to versions which correct errors in other versions. This will take us several installments further into this series, at which point we will resume our presentation of physical implementations of Version Patterns 4 and 5.

 

After that, we will begin our discussion of Version Pattern 6.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access