Note: A glossary of technical terms used in these articles can be found on the Web sites of the authors. The URLs are MindfulData.com and InbaseInc.com.
This glossary is being constructed as a controlled vocabulary. By that we mean that any expression involving temporal concepts which is used in the definition of a glossary entry must itself be a glossary entry.
One reason for being this careful with definitions is that this kind of rigor must be applied to definitions before they can be "ontologized," i.e. expressed in a formal notation like predicate logic. That, in turn, is what is needed to make definitions available for manipulation by software-realized inferencing engines.
Another reason for being this careful with definitions is that this is the kind of rigor that must be applied to any set of technical definitions before we can say with confidence that we know what we mean, and that we mean what we say, no more and no less.
In our previous article, we introduced the distinction between original and temporal transactions, and began by discussing original and temporal deletes. This article completes our discussion of delete transactions. We will then go on to discuss insert, update and upsert transactions.
These discussions describe the transformations required to map original transactions onto temporal transactions. This mapping is the "database update side" of the encapsulation of the complexities of temporal data management which, in Part 1, we said was an essential part of our approach.
These same discussions also provide the insight necessary to decide whether we should attempt to enforce temporal referential integrity (RI) on the updates themselves, or instead defer that integrity checking until data is retrieved. With respect to the deferral option, we will repeat what we said about it in the previous article:
From the Previous Article
For the sake of continuity, we repeat the following material from the previous article.
If there is a temporal RI dependency from table Y to version table X (not necessarily distinct), then no exposed state of the database is valid in which any row in table Y is not object-RI linked to a row in table X, or in which the effectivity time period of a row in table Y is not wholly contained within the effectivity time period of its parent row in table X. No queries are valid which would expose such a state.
An "original" transaction is a transaction created or authorized by the business. A "temporal" transaction is the result of translating an original transaction into a transaction against a versioned object.
To the business, when they are issuing a delete against an object, they don't care if the object is versioned or not, and indeed may not know. They are directing us to remove the object from the database. Thus, an original delete transaction doesn't need to specify a version, just the object being deleted and the date the delete is to take effect. This is illustrated in Figure 1.
Deleting a Versioned Object: The Existence Constraint
The original transaction we are considering is a delete. Just as with a delete to a non-versioned object, the first thing that must be done is to find the target row(s). One target row must be the most recent row of the current episode of the object. The original transaction provides the electronic identifier (EID) for the object. The most recent version of the current episode of the object is the one (nonhypothetical) version of that object with an effectivity begin date in the past and an effectivity end date either in the future, or containing the "temporal null" value "12/31/9999."
First of all, note the assumption that there cannot be more than one current version of an object in a versioned table. We recommend that you attempt to state, clearly and precisely, what guarantees that this assumption is valid. (Hint: Consider the logic of inserting into a versioned table. In the next article, this logic will be described in terms of the mapping from an original insert to a temporal insert, but in both the glossary and in previous articles, the material already exists for you to answer this question.)
If a current version of the indicated object is found, the delete can proceed. Otherwise, it is rejected as invalid. (We consider deletes which are explicitly against future versions in a later article. For now, we assume that all original transactions are against current versions.)
As described last time, an original delete of a versioned object requires us to supercede the most recent version of the current episode of the object with a delete version, and also to do the same for any active future (or hypothetical) episodes of that object. An active future episode is a future episode whose most recent version is not a delete version.
Deleting a Versioned Object: What is a Dependent Row?
If the version is found, the next thing is to look for any rows, in either object tables or version tables, that are dependent on it. This dependency is precisely that of temporal RI.
Recall as well that the database management systems (DBMS) cannot help us here. If we are to enforce temporal RI on delete transactions, we must do the work ourselves. And since this is work to enforce a database constraint, it does not belong anywhere in application-specific code. It must be implemented in the codebase that maps original transactions to temporal transactions. (This codebase, of course, may be expressed in part, or even in its entirety, as DBMS triggers.)
If the child table is an object table, a dependent row is one with the to-be-deleted row's EID as an object foreign key (OFK). Performance considerations would thus suggest that OFK columns, in all non-versioned tables, have a non-unique index defined on them.
If the child table is a versioned table, a dependent row in a current episode of its object is one which has an OFK that contains the value of the to-be-deleted row's EID, and with an effectivity end date either in the future, or else containing the value "12/31/9999". A dependent row in an active future episode of its object is the most recent version of that episode. (Until further notice, we will not refer to hypothetical objects, versions or episodes any longer. The approach to temporal data management that we are describing does accommodate such hypotheticals, but they require special treatment. They will be discussed in later articles.) In short, dependent rows for a temporal delete are all and only those rows which are OFK-related to the to-be-deleted row, and which are the most recent versions in the active episodes for their object.
Here, performance could be improved by using a non-unique unique index on OFK plus effectivity begin date (plus one more date, as we shall see when we discuss Version Pattern 7) in the dependent table. However, at this point we are getting into a detailed discussion of implementation options. We intend to discuss such issues in depth toward the end of this series, and so we will say no more about them here.
Deleting a Versioned Object: Block, Set Null and Cascade Options
The search for dependent rows is conducted under one of three directives. They are the familiar relational delete options of:
- Set null, or
From a performance point of view, which, as we stated last time, means from a physical I/O point of view, the block option is often less costly than the other options. This is because if there are multiple rows dependent on the to-be-deleted current version, we can stop checking as soon as the first one is found.
But if either the set null or the cascade option is chosen, then our search for temporally RI dependent rows in versioned child tables must be exhaustive. It must retrieve and examine all OFK-related rows.
Notice that even if the child table index used for the search is as described above, the rows themselves must be retrieved, because the check must look at the effectivity end date of the child rows. In very large tables, possibly with multiple OFKs, the cost of such indexes can be high. When an important to-be-deleted object has many dependent children, the I/O costs of a) searching both the index and the base table for possible episodes to terminate (i.e., possible most recent versions to supercede with temporal delete versions), and b) doing the appropriate writes to the database to terminate those episodes, may be prohibitively high.
But these are simply the trade-offs that any database administrator (DBA) must consider when physicalizing any data model. It is the cumulative weight of such costs that can lead us to decide to postpone temporal RI checking until retrieval time. In the case of temporal deletes, it would mean that we would just terminate all active episodes of the designated object, and not check for any dependencies.
Deleting a Versioned Object: The Foreign Key Ripple Effect
We know that normal, DBMS-enforced RI doesn't just link child tables to parent tables. When using the set null or cascade options, we must also consider child tables to those child tables, and continue down the RI-dependency chain as far as it extends. Thus, in a situation where we have clients, policies and claims, a cascade or set null delete of a client may require one or more policies to be set null or deleted. And if any are, the DBMS must go on to set null or delete the claims dependent on those policies.
Let's call this the "foreign key ripple effect." There are three things to note about this ripple effect as it applies to versioned objects.
- It does apply to versioned objects.
- The DBMS can't help us. We must write our own code to enforce temporal RI on temporal delete transactions.
- The performance cost will usually be much higher than for normal, DBMS-enforced RI.
So if we are enforcing temporal RI as the database is updated, the index and base-table lookups that we described above must be performed. In addition, as we just pointed out, they must be performed on all tables in the temporal RI chain of dependencies, thus on not just children of the to-be-deleted parent row, but also of grandchildren, great-grandchildren, etc.
What of the claim that performance costs will usually be much higher than for normal RI? A detailed defense of the claim, of course, cannot be provided unless we describe a specific implementation. But this would not prove that some better implementation might keep down the incremental performance costs. However, regardless of the specific implementation, they all share the following characteristic: there are going to be a lot more rows in a versioned table than there would be in the corresponding object table. The reason is that in versioned tables, it is not just inserts that create new rows. Versionable updates do also, and versioned deletes do also.
Deleting a Versioned Object: Temporal Delete Transactions
Let's suppose that we have identified both the most recent version of the current episode of the object specified on the original delete, the most recent version of any active future episodes of that object, and also the most recent version of all active episodes in all tables along the temporal RI-dependency chain. We then add a temporal delete transaction to each of these episodes, superceding the most recent version in each of them with a temporal delete version.
Note that these temporal deletes do not flag and rewrite these versions. That would be to do an update in place logical delete. As we explained in an earlier article, update in place logical deletes lose information, and that information may be considered versionable information by the business.
Temporal deletes insert a temporal delete version for each active episode. The effectivity end date on the delete version is set to the delete date on the transaction. This is because an original delete is a business transaction. Its intent is to remove the object from the database, as of the indicated date. For example, if a client is being deleted, the intent is that after the transaction is complete, that client no longer exists, i.e., that person or organization no longer exists as a client of our company. (We may assume that if a delete date is not supplied on the original transaction, the effectivity end date defaults to the date current when the transaction is physically applied to the database.)
Note also that nowhere in this process have we discussed a physical deletion of a physical row in a versioned table. Physically removing rows from versioned tables is a process we will call "archiving" those rows, and it is outside the scope of what this series of articles is about. See the taxonomy in Part 1 for additional discussion of the difference between archiving, creating snapshots, and versioning.
So just like original inserts and original (versionable) updates, original deletes to versioned objects become physical inserts or one or more rows.
There is one unfinished piece of business. Having argued that sometimes it is too performance costly to enforce temporal RI on original delete transactions against versioned objects, we should now show how that same RI is enforced if it is, instead, enforced at retrieval time. However, with deadlines approaching on our bill-paying projects, we are forced to postpone that discussion for a later time.
Next time, we will provide the same treatment for original insert transactions against versioned objects.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access