In this article, we will describe the physical implementation of Version Pattern 3. This pattern expands on Patterns 1 and 2 by retaining a history of updates. This pattern was logically described in part 4 and part 5 of this series.
Pattern 2 provided the ability to query objects that were previous deleted. However, it failed to save history created by updates. In Pattern 3, we will start saving this update history by using versioning, and will lay the groundwork for the more robust history patterns that follow.
One way to keep track of updates is to use history tables that are nearly clones of the object table, and are populated by using update triggers. There are pros and cons with using this approach, which we will discuss later. Pattern 3 takes an approach where all of the versions (current and past) are stored in the same table. A Policy was the object in the Policy table in Patterns 1 and 2. Now, the table will contain multiple versions of the same object. Therefore, to distinguish these, the table that contains the versions of a policy will be called the Policy Version table.
While you might assume that having a policy version table would also require a policy (object) table, this is not so. We will only populate and query the Policy Version table for our current and previous versions of an object. Some organizations might demand both tables for various reasons, but it is not required to support the semantics of this pattern. There are benefits to having only one (version) table to update, manage and query, so we will describe this pattern with this design decision.
Version Primary Key
Having versions of the same object in the same table requires a change to the primary key of the table. Policy number (policy_nbr) was the primary key in Patterns 1 and 2. However, policy number alone is no longer a valid primary key now that we are storing versions in the same table. We need something else as part of the primary key to distinguish multiple versions of the same policy. An alternative might be to replace policy number with a surrogate key. But since surrogate keys were not introduced as part of these earlier patterns, we will use a composite key made up of the policy_nbr and something else to define the version.
Figure 1: Correlated Subselect for Max (see PDF)
That something else could be a sequence number (version number), but for our examples we will use the version beginning (system activity) date (ver_beg_dt). Remember, we use dates in our examples to simplify them, but we really would use a date/timestamp to allow for multiple events per day. The primary key will be policy_nbr and ver_beg_dt.
The current version is identified by being the row for the policy that has the highest (most recent) version begin date. So, to find the most current version you would need to find the highest (MAX) date for the policy, then use that date to qualify the ver_beg_dt to get the single current row. This can be accomplished by using a correlated subselect, as shown in Figure 1. Either way, this approach has negative performance implications in applications that read these rows. We have seen this performance problem in implementations of this pattern, and it is often not discovered until there is a larger quantity of history and a load is placed on the application. This issue will be addressed and overcome in future patterns in this series, so stay tuned.
Returning a version of an object as of a certain point in time, while possible, is a bit more complex. You need to find the specific row that began (beg_dt) prior to the input date, but it must less than the next version that is higher. It is not a simple between predicate. This kind of query is shown in Figure 2.
Figure 2: As Of (see PDF)
The SQL in the following scenarios after the product data management (PDM) design will help demonstrate how to retrieve the various versions using an input parameter representing the as-of date.
Remember, this pattern only works with a single dimension of time. It is the time that the event is recorded in the computer system. We assume for this pattern that this date is both the system activity and business effectivity date. The later patterns in this series will show us how to work with both dimensions of time. But, for now, here is Pattern 3 Update History.
Version Pattern 3
Figure 3: Policy Version 3 PDM Pattern (see PDF)
Version 3 Scenario 3.1
Figure 4: Scenario 3.1 (see PDF)
Version 3 Scenario 3.2
Figure 5: Scenario 3.2 (see PDF)
Version 3 Scenario 3.3
Figure 6: Version 3 Scenario 3.3 (see PDF)
Version 3 Follow-up
As explained in prior PDM articles, we will expand upon the original scenarios with a follow-up section that demonstrates additional capabilities of the pattern. This follow-up for pattern 3 shows one of the benefits of storing the current and past versions in the same table by being able to use the same SQL and application logic to retrieve and process both the current and previous versions, just by supplying a different input date.
Figure 7: Version 3 Follow-up 3.4 (see PDF)
Figure 8: Version 3 Follow-up 3.5 (see PDF)
One or Two Tables?
Pattern 3 combines both the current version and the past versions into a single table. There are pros and cons to this, both physically and logically. One of the semantic constraints of Pattern 3 is that date represents both the business effectivity date and the system activity date. So, the date in the computing system on which it appears is the date that the event is effective for the business. This also implies that we cannot accurately allow retroactive or future date processing; that is, an event cannot be dated prior to the current version, nor can it be dated in the future.
Consequently, it would not significantly impact the application if the various versions were stored in one or two tables, as far as Pattern 3 is concerned. However, in future patterns we will show why we combine all versions in a single table to better support bi-temporal patterns and retroactive and future date processing.
Following is a list showing some of the pros and cons combining or separating versions into tables.
Figure 9: Versioned Tables versus Current Plus Historical Tables (see PDF)
If you are trying to retrofit history into an existing database and application that has no history, then you might consider creating a separate history table, populated by triggers on the original base object table. Otherwise, for a new application we lean towards combining the versions into the same table, possibly with partitioning, with the non-current versions.
This article has shown how to implement Version Pattern 3. This pattern allows you to easily query any version, current or past, based on the supplied as-of date input parameter. This reduces the number of database objects, application objects and SQL required for the application. More importantly, we start to better and more accurately retain history that is required by governing bodies and regulations, such as Sarbanes-Oxley (SOX), National Association of Insurance Commissioners (NAIC) and Securities and Exchange Commission (SEC). It also helps support auditing requirements and best practices desired by most major companies today. However, it does not yet support accurately post-dating or pre-dating database events.
In upcoming articles, we will continue to demonstrate how to physically model these more advanced temporal patterns, and will also present the DDL to create these tables and their related database objects, plus the SQL needed to maintain and query the data.
Stay with us as we travel through time!
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access