At the conclusion of our previous article, we said that there are two options for enforcing temporal integrity constraints on a database. One is to enforce the constraints at the time of update so as to prevent violations from being persisted in the database. The other is to let violations occur, but to filter them out at retrieval time. This would be done by means of views against which all normal queries would be written.
It's easy to disparage the latter option by describing it as letting temporal "garbage" onto the database, and then trying to hide that garbage from queries. But this is indeed the approach we have taken on temporal databases that we have designed, built, put into production and supported, and we had good reason for doing so. Like almost all database professionals, we were concerned that there would be an overwhelming volume of cascade updates if we tried to enforce temporal RI at time of update, especially when the update involves the delete of a versioned object that was the "parent" of possibly many other objects, in many other tables. (See the introduction to Part 10 for a more complete description of this "cascade update" fear.)
We believe it is time to re-examine this fear, and see if the performance hit for enforcing temporal RI on update transactions is as great as we have feared. We assume here that for as long as data storage continues to involve spinning disks, cylinders and mechanical movement of read/write heads, to get to required data, we can safely simplify our performance issue and treat it as a matter of how many physical I/Os, how many block-level transfers between a storage device (or its cache) and main memory, are required to complete a temporal RI validation. With this in mind, let us now examine two things the logic of transactions against versioned objects, and the I/O that is required for those transactions.
Temporal RI Reformulated
To more explicitly allow for the query-time enforcement of temporal integrity constraints, we will slightly amend the definition of temporal RI given in Part 20, as follows:
If there is a temporal RI dependency from table Y to version table X (not necessarily distinct), then no exposed state of the database is valid in which any row in table Y is not object-RI linked to a row in table X, or in which the effectivity time period of a row in table Y is not wholly contained within the effectivity time period of its parent row in table X. No queries are valid which would expose such a state.
There are three changes to the original definition:
- "Version table Y" changed to "table Y." As explained last time, we allow both versioned and non-versioned tables to be object-RI dependent on a versioned table.
- "State of the database" changed to "exposed state of the database." This allows for the option of enforcing temporal RI on queries rather than on updates.
- "No inserts, updates or deletes are valid that would transform a database into such a state" replaced by "No queries are valid that would expose such a state." The replaced sentence was appropriate for update-enforced temporal RI. The replacing sentence is appropriate for either update- or query-enforced temporal RI.
We will assume, in the following discussion, that we are dealing with update-enforced temporal RI. In later articles, we will discuss query-time enforcement.
Note: before proceeding, please note that the glossary introduced a couple of installments ago is not included in this article, nor will it be included in future articles. The reason is that the glossary is already over 3,000 words long and likely to grow to several times that size.
One reason for the glossary being this extensive is that we are making the definitions of its entries extensive. Most of these concepts are complex, and we don't think it is possible to do them justice with one- or two-sentence definitions.
Another reason for the glossary being this extensive is that we are attempting to construct it as a controlled vocabulary. By that we mean that any expression involving temporal concepts which is used in a definition must itself be a defined glossary entry. This is the kind of rigor that must be applied to definitions before they can be "ontologized," i.e., expressed in a formal notation like predicate logic and thus available for manipulation by software-realized inferencing engines. It is also the kind of rigor that must be applied (to any set of technical definitions) before we can say with confidence that we know what we mean.
Of course, all this just makes the glossary that much more important than if its entries were brief and informal. So we have decided to put the glossary on each of our Web sites, to keep it updated there, and to include a link on our respective homepages to that glossary. The URLs are MindfulData.com and InbaseInc.com.
In this review of updates to versioned tables, our approach will be to first describe the logic of those updates, including the enforcement of temporal constraints. After that, we will go on to consider the physical I/O costs of enforcing those constraints on those updates. The specific types of update that we will consider are:
- (Row-level) updates; and
We begin with some terminology. An "original" transaction is a transaction created or authorized by the business. A "temporal" transaction is the result of translating an original transaction into a transaction against a versioned object. As we will see, while each original transaction appears to its author to be an update against a single row of an object table, the translation may result in a set of updates to multiple rows in possibly multiple tables. (The original/temporal distinction, and other key terms of our temporal data management approach, are defined in the glossary.)
Original Deletes: Their Semantics
To the business, when they are issuing a delete against an object, they don't care if the object is versioned or not, and indeed may not know. They are directing us to remove the object from the database. Thus, an original delete transaction doesn't need to specify a version, just the object being deleted and the date the delete is to take effect. This is illustrated in Figure 1.
Let's organize this discussion around three questions.
- First, when an object is versioned, and a business user issues or authorizes a delete, what does he think he is deleting, an object or a version?
- Second, when an object is versioned, and a business user issues or authorizes a delete, what is actually deleted?
- And finally, how is the actual delete transaction carried out?
As for the first question, the author of the delete transaction intends to delete an object, and believes he is doing so. For example, he may issue a delete of a specific client, against a database in which clients are versioned, but he does not issue a directive to delete a specific version of that client.
The reason that original update transactions are specified against objects, and not against versions, is that the burden of managing versions should not be placed on the business user, in this case on the author of the delete transaction. Versions are simply an internal mechanism for satisfying the business demand for real-time access to noncurrent data, which is just as fast and easy to access as current data. As far as possible, versions and their complexities should be hidden from both end users and developers.
A delete transaction against a versioned object is valid if and only if there is a current version of that object on the database at the time of the transaction. Necessarily, this version will always be the one current version of the one current episode of that object, and applying the delete will delete the object by terminating its current episode.
But what about past versions, future-dated versions or hypothetical versions? As for past versions, it is reasonable to expose semantics to the user that prohibits them from updating the past. Just as with good accounting practice, if what we recorded about the past was wrong, we do not update that original entry. Instead, we enter adjusting transactions.
As for future or hypothetical versions, it could be said that if we allow users to create them, as we in fact do, then we expose the mechanism of versioning to those users. But this isn't so. Consistent with our principle of hiding the mechanism of versioning as much as possible, we do not expose to the user or developer the concept of deleting something that doesn't yet exist.
Instead, all that we have exposed to the user is functionality, the ability to insert transactions in advance of when they are to become effective, or even transactions that may never become effective. The mechanisms by which we do this are not themselves exposed. From the user's point of view, it appears as if we have simply accepted transactions in advance of when they should be applied and stored them somewhere in a transaction file, with the promise that we will apply them precisely on the date they are to become effective.
Temporal Deletes: Their Targets
So, given that there must be a current version of the object of an original delete, and there may be, in addition, past, future and hypothetical versions as well, what is actually deleted? This is determined by the code that transforms an original delete into a temporal delete. We may think of this code as the "transform" portion of an extract, transform and load (ETL) process against a batch file of original transactions. We may also think of this code as a stored procedure invoked against individual online transactions.
In either case, the important thing is that business users, developers and database administrators (DBAs) all work with original transactions and not with temporal transactions. The code that translates original to temporal transactions encapsulates the complexity of versions and their manipulations, supporting their functional semantics without exposing their implementation complexities.
This question about the targets of temporal transactions is the second of the three questions in the previous section. The answer is: the current version, and any future and hypothetical versions as well. To the user, it will seem that we have done three things:
- Deleted the one row representing the object in question;
- Flushed any remaining transactions for that object from the transaction file just mentioned; and
- If there were hypothetical versions, removed that object from one or more "hypothetical situation" databases.
But how is this done? What physically happens when the object of an original delete is a versioned object? The answer is that this is done by inserting a new version that supercedes the current version of the object being deleted, as well as a new version for each of the future or hypothetical versions of that object.
The process begins by finding the current version of the object specified on the original transaction. If it is not found, the transaction is invalid. Otherwise, a superceding delete version is created which has the same object identifier and business effectivity begin date as the version it is superceding.
The effectivity end date on the superceding delete version is set to the delete date on the transaction. This is because an original delete is a business transaction. Its intent is to remove the object from the database as of the indicated date. For example, if a client is being deleted, the intent is that after the transaction is complete, that client no longer exists, i.e., that person or organization no longer exists as a client of our company. (We may assume that if a delete date is not supplied on the original transaction, the effectivity end date defaults to the date current when the transaction is physically applied to the database.)
On these versions created by an original delete transaction, a delete flag is set to "yes." For all but these delete-created versions, this flag is set to, and remains, "no." The purpose of this delete flag is to distinguish deletes that happen in response to an original delete transaction from deletes that happen because a non-12/31/9999 end date comes to pass.
Note that nowhere in this process have we discussed a physical deletion of a physical row in a version table. Physically removing rows from versioned tables is a process we will call "archiving" those rows, and it is outside the scope of what this series of articles is about. See the taxonomy in Part 1 for additional discussion of the difference between archiving, creating snapshots, and versioning.
Next time, we will finish our discussion of original and temporal delete transactions by examining the foreign key implications of deleting a versioned object. We will then be in a position to access the risk and the extent of potential "cascade" effects of such deletions. Since what we are proposing is an enterprise solution, we must either evaluate performance implications against worst-case scenarios, or else design our original transactions with a flag that can selectively turn on or off transaction-time temporal validation checks.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access