Note: A glossary of technical terms used in these articles can be found on the Web sites of the authors. The URLs are MindfulData.com and InbaseInc.com.
This glossary is being constructed as a controlled vocabulary. By that we mean that any expression involving temporal concepts which is used in the definition of a glossary entry must itself be a glossary entry.
One reason for being this careful with definitions is that this kind of rigor must be applied to definitions before they can be "ontologized," i.e., expressed in a formal notation like first-order logic. That, in turn, is what is needed to make definitions available for manipulation by software-realized inferencing engines.
Another reason for being this careful with definitions is that this is the kind of rigor that must be applied to any set of technical definitions before we can claim not only that we say what we mean, but also that we know what we mean when we say it.
As this series continues, context becomes increasingly important so that the thread of the discussion is not lost. Figure 1 reviews what we've done so far.
Figure 1: Chart of Installments to Date in the Series (See PDF link at the end of the article.)
Last time, we began looking at original and temporal insert transactions but immediately digressed into a discussion of the problems that can be encountered when matching transactions to target tables. We planned to discuss both temporal entity integrity and temporal referential integrity as they apply to insert transactions in this installment. But we discovered that those discussions require another digression, this time into how we use pairs of dates to represent periods of time.
We also discovered that we have vacillated in this series between two methods of using pairs of dates to represent periods of time, methods which are called the "closed-closed" and "closed-open" approaches. Two other methods are possible: "open-closed" and "open-open." But we won't enter into an extensive discussion of all four methods, because that has already been covered by both Snodgrass and by Darwen and Lorentzos.1,2
There is both a practical and a theoretical side to the question of how to represent time periods with dates. On the theoretical side, dates (or datetimes, or timestamps) are discrete points along a continuous timeline. And as Zeno first discovered, and Leibniz and Newton later formalized, the relationship between the continuous and the discrete is problematic, equally so for time as for space.
Points in Time and Periods of Time: Practical Issues
Consider the following two versions, V1 and V2, of the same object. Let us suppose that there is no gap in time between them, i.e., no gap along the "effectivity timeline." How are we to represent this? Two of the possibilities, the two specific ones we have vacillated between, are shown in Figures 2 and 3.
We also assume, in these examples, that the clock tick granularity is one day.
In both cases, the first effective time period starts on 2/19/06 and ends on 5/22/07, and the second starts on 5/23/07 and ends on 10/14/09. So it might seem that the closed-closed representation is the correct one, and that the closed-open representation is simply wrong. But that is not the case.
We could try to illustrate the wrongness of the closed-open representation by querying for the time period that contains the date 5/23/07. On the closed-open representation, can we tell which version's time period is the desired one? Yes, we can. We just need the following WHERE clause in our SQL query:
WHERE EFFECTIVE-BEGIN-DATE LESS-THAN OR EQUAL TO "5/23/07"
AND "5/23/07" LESS THAN EFFECTIVE-END-DATE
With this clause, the query will correctly pick out V2.
So why might we have thought that the closed-open representation is wrong? Probably because we had the mental image of our desired date being between the begin and end dates of the versions. But "between" as we ordinarily understand it is not "BETWEEN" as SQL understands it.
We would not have gotten confused, of course, if we had used the closed-closed approach. In that case, we could have written:
WHERE "5/23/07" BETWEEN EFFECTIVE-BEGIN-DATE AND EFFECTIVE-END-DATE
But because both clauses return the correct result, provided each is used with its corresponding method of representing periods of time, both methods are equally correct.
So in our own vacillation between these two methods, it was this consideration which led us to (sometimes) describe the closed-closed method as the one we preferred.
What, then, is the advantage of using the closed-open method? Well, look again at Figures 2 and 3. In both cases, V1 and V2 are contiguous. We know this because we have set up the example on the assumption that there is no gap in time between them. With the closed-open representation, however, we can instantly see that there is no such gap. But with the closed-closed representation, we would also need the information that the clock tick granularity being used is a granularity of one day.
The real problem with the closed-closed representation, however, is not what is or is not immediately clear to us human beings. It is the relative complexity of the code which will often be called upon to determine, of two consecutive versions, whether or not they are contiguous, i.e., whether or not there is a clock tick between them. With a closed-closed representation, that code will also have to know what the clock tick granularity is.
So in our own vacillation between these two methods, it was this consideration which led us to (sometimes) describe the closed-open method as the one we preferred.
Once the SQL standards groups can agree on temporal extensions to the standard, part of that agreement will certainly be a way of representing time periods directly, without relying on the confusing circumlocutions of various ways to use pairs of dates to represent time periods. But our concern, in these articles, is with today's SQL, and so we must choose a date-pair method of representation. Therefore, from this point forward, unless we discover other considerations which would dramatically tip the scale the other way, we will use the closed-open representation of time periods.
Points in Time and Periods of Time: Theoretical Issues
During our "closed-closed" phase, some time ago, one of our readers wrote us to say that closed-closed was simply wrong for what we will call "theoretical" reasons. He wrote:
A time period most definitely does not end one clock tick prior to the end date. It ends at precisely the time specified as the end date .. If the period ends at 12:10:15 then when the clock strikes 12:10:15 the period is over THEN and the next period begins precisely THEN. At that instant
If you dont do it that way then you are saying that date/time fields that are used for the beginning times are interpreted as meaning the beginning of the stated time period and date/time fields used for ending times are interpreted a DIFFERENT way, namely as indicating the end of the stated time period. One cant do things that way, i.e., changing the definition of a data type based on the context.
This reader also has an argument that is practical in nature, namely that with the closed-open method, date arithmetic will always involve a "minus 1" clause that would not be required on the closed-closed approach. But as we have seen, there are practical concerns no matter how we choose to use date-pairs to represent time periods, and these concerns are therefore not conclusive.
Our first response to our reader's theoretical argument, briefly, is this. First, the closed-open approach does not "chang(e) the definition of a data type." If we changed the definition of a data type depending on which columns we applied it to, then the DBMS or programming language that accessed those columns would not be able to handle them correctly. So we are not changing the definition of the data type. The issue, instead, is what we take each date to mean. And as long as we write our SQL carefully (see the WHERE clauses, above), we will get the correct results no matter which date-pair interpretation we choose.
Our second response is that decades of computer science research have shown that we can and should represent the flow of time as a succession of atomic clock ticks, ticks which may be of any actual length of time, such as a day, a nanosecond, etc. But no one ever has, and no one ever will, invent a clock so precise that there is no time at all between its ticks. Only if we had such a clock could we use clock ticks to represent time periods in a manner that conforms to the intuitions of this reader, namely that "If the period ends at 12:10:15 then when the clock strikes 12:10:15 the period is over THEN and the next period begins precisely THEN. At that instant."
This interpretation creates an ambiguity in the use of date-pairs to delimit time periods that can never be resolved. For on this interpretation, two adjacent versions overlap at exactly that tick of the clock. On that tick of the clock, this interpretation requires us to say that both V1 and V2 are in effect.
Without going into any detail about the theoretical underpinnings of these arguments, we can make the following points.
First, at the Planck scale, both space and time are, as far as we can measure them, discrete, with a unit of Planck time (5.3906 x 10 to the minus 44 seconds) being the time it takes for a photon of light to travel the Planck distance (1.6160 x 10 to the minus 35 meters). But this is too theoretically remote to be relevant to a discussion of time as far as IT is concerned. However, it does cast considerable doubt, obviously, on the hope that we will ever have a clock whose ticks do not take up a finite period of time.
Second, with quantum physics considerations aside, both space and time are continuous, but neither points (in space) nor instants (points in time) are. The ensuing conceptual difficulties were first noted by Zeno, which he described in his paradox of the hare and the tortoise.
The solution to the correct representation of something continuous by means of something discrete was ultimately solved, for space at least, by Leibniz and Newton. And their concept of a limit applies equally well to time. But this is still too remote for its relevance to managing time in relational databases to be apparent. However, for a good basic discussion of these issues, we recommend Sowa 2000.3
Finally, though, the concepts necessary to correctly manage the relationships among time periods represented as pairs of dates were originally formulated by James F. Allen in the mid-1980s. This reference, and an excellent exposition of this material, is contained in Snodgrass 2000 and in Darwen and Lorentzos 2002. Both contain references to the earliest of three articles in which Allen explained his concepts.
With our digressions completed, we will proceed to a discussion of original and temporal inserts in our next article, and will explain how the temporal correlates of entity integrity and referential integrity apply to them.
- C. J. Date, Hugh Darwen, Nikos Lorentzos. Temporal Data and the Relational Model. Morgan-Kaufmann, 2002, 90-94.
- Snodgrass, R. T. Developing Time-Oriented Database Applications in SQL. Morgan-Kaufmann, 2000.
- Sowa, John F. Knowledge Representation. (Brooks-Cole, Pacific Grove CA, 2000, 103-124.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access