In this installment, we will begin a discussion of how temporal referential integrity (RI) may be enforced with today's relational database management systems (RDBMSs). In addition, we will expand the glossary which we began last time, in Part 19. Indeed, the glossary work has proven to be the major portion of the work done here in Part 20. This seems to be the right point in our series to begin formalizing the definitions of the terms we have introduced.
Object RI and Temporal RI: A Clarification
In Part 19, we said that by "object RI" we meant the normal DBMS-enforceable referential integrity, implemented by populating a column designated as a foreign key to table X with the value of one of the primary keys in that table. That was wrong, and is indicated as such in the glossary by shading that entry. A new corrected definition is also included.
On that basis, we went on to give the following definition of "temporal RI": If there is a temporal RI dependency from version table Y to version table X (not necessarily distinct), then no state of the database is valid in which any row in table Y is not object-RI linked to a row in table X, or in which the effectivity time period of a row in table Y is not wholly contained within the effectivity time period of its parent row in table X. No inserts, updates or deletes are valid that would transform a database into such a state.
As this discussion should make clear, our corrected definition of the term "object RI" does not require us to amend the above definition of "temporal RI."
Notice that in this definition, the dependent-on table (X) is itself a version table. Consequently, it contains no rows for objects, only rows for versions of objects. To illustrate, consider this schematic of a version table, also copied from Part 19. We also add a schematic of a dependent table which, in our ongoing example, is the Policy table.
In the Policy table shown here, the notation (OFK) stands for object foreign key. The object mentioned in this definition is, in this example, a specific client. The client is represented by the OFK column client-nbr.
If the referenced table were itself an object table, and not a version table, then client-nbr would be a "normal" foreign key, not an OFK. It would contain a value that was unique in the designated primary key column of the referenced table, and its validity could be checked by the standard RI mechanism of the DBMS (see Glossary). But the definition specifically states that it is a definition that involves two version tables. Consequently, client-nbr in the referenced table, by itself, is not a complete primary key. So it is not something the DBMS can enforce.
So what does "object RI" mean when there is no unique object to refer to, when, whatever it is, it is not enforceable by a RDBMS? By the same token, what kind of foreign key is an OFK?
An OFK column contains the value of a unique object even though a table with that value as primary key does not exist. It is as if, instead of Figure 1, versioning were implemented as shown in Figure 2.
In the as-if situation depicted in Figure 2, these are the foreign key dependencies:
Policy Version table to Policy table.
Client Version table to Client table.
Policy table to Client table.
These RI dependencies are all object RI dependencies because in all cases, the table referenced by a foreign key is an object (a non-version) table. In addition, all these object RI dependencies can be enforced by the RDBMS. So to keep this distinction clear, we will use the term "standard RI" (or "RI" by itself) to mean RI among non-versioned tables. Standard RI is RI that is enforceable by the DBMS.
This leaves us free to redefine the term "object RI." We will now use it to mean a constraint that applies between a pair of tables, where the referenced (parent) table is a version table. In that referenced version table, one part of its primary key will be an identifier for the object whose version it is. In the referencing table, there is a column which is constrained to contain a value matching the object identifier part of a version in the referenced table which is current at the time the referencing (child) row was inserted. That column is what we are calling an OFK. In addition, if the referencing row is not in a version table, then an update in place can be applied to it, of course. If that happens, the object RI constraint requires that the new value put in the OFK column must match the object identifier in a version current at the time of the update.
The Glossary below contains the indicated corrections, and also includes new expressions not defined previously.
Child table, child row. Y is a child table to X if and only if there is a foreign key dependency from Y to X. A row in Y is a child to a row in X if and only if the row in Y has a foreign key whose value is identical to the primary key value of that related row in X.
- Parent/child relationships typically have a maximum cardinality of one/many, and a minimum cardinality of optional for the parent and required for the child. But it is a matter of which table contains the foreign key and which table is the reference of that foreign key that differentiates parent from child tables and rows. Cardinality constraints are not what make the difference.
- Also sometimes referred to as dependent tables or dependent rows, or various cognates (e.g., RI-dependent tables, RI-dependent rows).
Current version, most recent version. A version of an object whose effectivity begin date is chronologically the latest across all non-future versions of that object, and whose effectivity end date is either unknown (represented by 12/31/9999) or in the future.
- Until we begin discussing Version Pattern 7, we will assume that no two versions of the same object can have the same effectivity begin date. After all, between that date and the earliest effectivity end date between them, that would mean that we had two different statements of what was true about an object during that period of time. But there can only be one version of the truth.
- With Version Pattern 7, we will discuss cases in which two or more versions, of the same object, might have the same effectivity begin date.
- One source of such cases is when there is a business requirement to correct an error in a version table, but also to retain, as queryable history, the version discovered to be in error.
- Another source of such cases would be where one of a few number of things might be true of an object in a given time period, but we dont know which one.
Effectivity time period. The period of time during which a version is regarded as the truth about an object.
- In our version patterns, an effectivity time period is defined by an effectivity begin date and an effectivity end date, where "date" may be a calendar date or any other "tick of the clock" (as described in Part 2).
- Our convention is that the time period begins on the begin date, but ends one clock tick prior to the end date.
Logical delete. A delete which marks the row as deleted, but leaves it in the table.
- Also known as a business delete, or referred to as a semantic delete or a cognate.
- See physical delete.
Episode. Each version Y of an object which does not supercede a version X current at the time Y is created begins an episode of an object.
- See supercession.
- The first version of an object creates the initial episode of that object. That remains the current episode until a logical delete version for that object is inserted into the table.
- If, when a version Y is inserted into the table, the most recent version of that object already in the table is a logical delete version, then Y creates a noninitial episode of that object. That remains the current episode until a logical delete version for that object is inserted into the table.
- If, when a version Y is inserted into the table, the most recent version of that object already in the table is a current version, then Y supercedes that version but remains within the same episode as the version it supercedes.
- Current episode. An episode of an object whose most recent version is not in the future and is not a logical delete
- Past terminated episode, terminated episode. An episode of an object whose most recent version is not in the future and is a logical delete
- Future episode. An episode of an object whose most recent version is in the future and is not a logical delete
- Future terminated episode. An episode of an object whose most recent version is in the future and is a logical delete.
Object foreign key (OFK). A column in a versioned table which contains the object identifier used by one or more rows in a (not necessarily distinct) version table.
- For example, in Figure 2 of Part 20, client-nbr in the Policy table is an OFK. It is constrained to have the value of a client-nbr in one or more rows in the Client table, such that at least one of those rows was current as of the time the row containing the OFK was created.
Object referential integrity, object RI (revised). The constraint that when a row containing an OFK is inserted, the value in that OFK matches the value of the object identifier portion of a primary key in a (not necessarily distinct) version table.
- Object RI cannot be enforced by today's RDBMSs. This follows from the fact that OFKs reference an object only indirectly, by means of the one or more versions that implement it. In those versions, the referenced object identifier is only part of the primary key of its table, and is thus not necessarily (or even usually) unique.
- Object RI requires that the referenced (parent) table be a versioned table. But the referencing (child) table, the one that contains the OFK, may itself be either a versioned or an object (non-versioned) table.
Object table. A table whose rows represent persistent objects. Sometimes called a "non-versioned" table.
- Persistent objects are things that exist over time and can change over time, such as vendors, customers, employees, regulatory agencies, products, services, bills of material, invoices, purchase orders, claims, certifications, etc.
- From an online analytical processing, star-schema point of view, dimension tables are tables of persistent objects. For more in-depth discussion of different types of tables, see the articles "An Ontology of Tables" at MindfulData.com.
- From an online transaction processing database point of view, assuming that there are no version tables in the database, object tables are all the tables that are not transaction tables.
- Roughly speaking, object tables are the tables which are the concern of master data management.
Parent table, parent row. X is a parent table to Y if and only if there is a foreign key dependency from Y to X. A row in X is a parent to a row in Y if and only if that row in Y has a foreign key whose value is identical to the primary key value of that row in X.
- Parent/child relationships typically have a maximum cardinality of one/many and a minimum cardinality of optional for the parent and required for the child. But it is a matter of which table contains the foreign key and which table is the reference of that foreign key that differentiates parent from child tables and rows. Cardinality constraints are not what make the difference.
- Also sometimes referred to as dependent-on tables, or dependent-on rows various cognates.
Physical delete. A delete which physically removes the row from its table.
- See logical delete.
Queryable history. Data about an object that was valid at some time in the past, which is no longer currently valid, but which is as easily and rapidly accessible as current data.
- As easily and rapidly accessible as current data means what it says. Our way of providing this access is to use version tables. In such tables, production queries against current data (the most common kind of query) can be used to retrieve historical data simply by adding a date to a BETWEEN clause of the SQL statement.
- As we have mentioned before (and will again), providing queryable history, in this manner, can significantly lower the development and operations cost of accessing historical data, and significantly improve the availability of the historical data retrieved.
Standard referential integrity, standard RI. The referential integrity constraint among non-versioned tables that today's RDBMSs can enforce. Because we are calling these non-versioned tables "object tables," we erroneously equated object RI with standard RI in Part 19.
Supercede, supercession. In these articles, we use these terms to refer to the replacement of a current version with a new current version.
- Supercession is a logical function. Physically, supercession is done by inserting a new row in a version table.
- Deletion in a version table is done via supercession.
- Versionable updates in a version table are also done via supercession.
- However, creating the first version of an object does not involve supercession, as there is no current version of that object to supercede.
- A more subtle point: Creating the first version of a noninital episode of an object also does not involve supercession. Even if other versions of an object exist, every episode is concluded with a logical delete version. Thus, when a new episode is started, the most recent prior version of that object is a logical delete version. Thus, there is no current version of the object to supercede when that new episode begins.
Temporal referential integrity, temporal RI. Referential integrity for versions, which consists of object RI plus the constraint that the effectivity time period for child rows are wholly contained with the effectivity time periods of their parent rows.
Version, version table. A table whose rows represent versions of persistent objects.
- A version of a persistent object is a time-slice of that object. A row in a version table represents a version, and describes that object as it was, is, will be and/or might be during its effectivity time period.
- A row in a version table is what the custodians of the table believe is the truth about the object it represents during the indicated effectivity time period.
Wholly contained in (for effectivity time periods). Time period 2 is wholly contained in time period 1 if and only if the effectivity begin date of time period 2 is equal to or later than the effectivity begin date of time period 1, and the effectivity end date of time period 2 is equal to or earlier than the effectivity end date of time period 1.
- In order to implement this constraint in today's production databases, we rely on SQL data types and operators. Specifically, we rely on dates and/or timestamps, and the DBMS implementation of comparison operators for those data types.
- When we use "12/31/9999," we let the DBMS interpret that string as a valid date. However, semantically, it almost never would be a valid date, because no business activities that we know of are concerned with an effectivity time period that extends up to but not including or beyond that date nearly eight-thousand years in the future.
Our ground-clearing that was needed before we discuss options for enforcing temporal RI is hopefully now complete. If it is, Part 21 will begin our discussion of those options.
Beyond these issues, we must also discuss temporal RI as it applies to future versions, to hypothetical versions and also to versions that correct errors in other versions. This will take us several installments further into this series, at which point we will resume our presentation of physical implementations of Version Patterns 4 and 5.
After that, we will begin our discussion of Version Patterns 6 and 7.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access