Starting with this installment in our series, we will begin to describe the physical implementation of these versioning patterns. To this point, patterns have been presented from a logical perspective. Data modelers would recognize these discussions as being about logical data models, rather than physical data models. Almost equivalently, we could say that previous articles in this series have discussed the semantics of versioning rather than its implementation.
We now turn to the corresponding physical data models and to their implementation in a real database. Throughout the rest of this series, we will refer to articles that focus on implementation as "PDM (physical data model) segments" of this series, where each segment is a separate article. We will reference each pattern by version number, and we will use the illustrations and the scenarios of the version pattern within the context of each of these PDM segments.
These PDM segments will show the physical Data Definition Language (DDL) used to create the tables in the schema; the Structured Query Language (SQL) used to populate and update those tables; and the SQL used to query the tables. We will also discuss performance and data integrity in the physical database. In addition, we will describe and discuss known issues, database management system (DBMS) limitations and alternatives to the implementations that are depicted here.
We encourage you to create, populate and query these objects in your own database. By doing so, you will deepen your appreciation of the temporal functionality provided by the early version patterns, and of the more complete functionality provided by the later patterns. In addition, you may learn to develop some of your own variations on these patterns, variations that might better apply in your own DBMS and application environment.
We will be using Microsoft SQL Server as the DBMS. However, with some minor adjustments, the SQL and the designs suggested in these PDM segments could be used in other DBMSs, such as IBMs DB2, Oracle or MySQL. Various database objects will be created, including tables, views, indexes, primary keys, foreign keys, triggers and stored procedures. However, we will assume that the reader will create other needed objects, such as databases, containers, file groups and tablespaces, and will have the appropriate permissions to create and use these objects.
In SQL Server the schema owner is often defined as dbo.We will not prefix the objects with dbo, but that will be the implicit schema name. However, in many DBMSs the default schema name is the same as the current SQL ID or login ID of the person that is connected to the database. This should suffice for most of these examples.
Some DBMSs will implicitly create certain database objects based on the creation of other objects. For example, some DBMSs will automatically create indexes based on the creation of primary keys and foreign keys, but others will not. In many of our examples, we will include the DDL to create objects that might be implicitly created by other DBMSs. Thus, you many encounter some minor syntax or duplicate object errors when declaring these objects to your DBMS. In many cases, you will be able to ignore these errors. We suggest that you confirm that the error does not compromise the model, and that the fundamental purpose of the object, function or statement is fulfilled in some other way.
The SQL we present will often use literals for the variables rather than parameter markers (host variables). This is for illustrative purposes only. Normally, for systems that require high performance, applications should use either statically bound SQL or prepared statements that use parameter markers in dynamic SQL such as JDBC rather than dynamic literals. This strategy saves a significant amount of overhead, avoiding constant reoptimization of the SQLs access path. So, while we will use literals in our examples, this is not the recommended approach and is only used to clarify the examples.
Note also that some companies have standards to not use a select clause. We will use it in our examples simply to save textual space.
The insurance industry has extensive date/time requirements for their application databases. These include:
- The date range an insurance policy is effective,
- The date an event occurred,
- The date a service was performed,
- The date a claim was adjudicated, and
- The date a claim was paid.
Often the dates when a policy or a policy-related event are recorded in a database do not match the effectivity range of the policy or the date on which the related event actually occurred. This distinction is one of the reasons we chose to use an insurance application in our examples. However, many other industries have complex date-driven rules and business processes. In fact, most large companies need to adhere to best practices or to various regulations, such as Sarbanes-Oxley, that require appropriate accounting and tracking of data changes. Therefore, we believe that while your application may not appear to currently need advanced version patterns, it is likely that, in time, they will.
So, as you read these physical model segments and try out the examples, you should be able to easily translate these PDM entities, attributes, objects and examples into your own industrys terms.
In non-versioned data models, and also in some of the earlier patterns in this series, the primary key of the table is also considered the unique identifier of the object. For example, either a business key or a surrogate key of the Policy object might be called policy_nbr, and it would be the primary key in the physical database. However, as we progress through the more advanced versioning patterns, we will discover that the object identifier alone will not uniquely identify a single occurrence (row) in a table because multiple versions of an object may (and usually will) exist in a versioned table. This restriction in current relational databases that the distinction between an object and its versions must be made by the developer and is not built-in to either standard or vendor-augmented DDL forces us to create a new surrogate key that represents the composite of an object id and either a version date/time or a version id, which now becomes the unique primary key.
Therefore, as you review these patterns, consider the fact that the object id will not always be considered the primary key, regardless of whether it is a business key or a surrogate. So to help you distinguish the different columns that would normally represent a single object and the primary key in non-versioned tables, these columns (keys) will be suffixed with OID, meaning the object id, and surrogate columns that distinguish versions of an object will be suffixed with VID, meaning version id.
These PDM segment articles will be structured as follows:
- Reference to the version pattern as defined in the earlier articles.
- A brief description of the pattern.
- A table facsimile with sample rows.
- The DDL to create the base table(s).
- Supporting DDL, such as constraint, index or view creation.
- Special design notes and considerations.
- A copy of the scenario figure used in the original version pattern.
- The Data Manipulation Language (DML) to update the database for each event.
- SQL to query the results at various points in time.
- The results of the queries in tabular format.
Figures 1 and 2: Templates that Will be Used (All figures are oversized and linked at the end of the article) (See pdf).
The initial patterns will concentrate on the Policy table. We will retain pattern versions in separate tables so we can compare results. Therefore, each Policy table will be prefixed with a P# to identify each pattern number. We will also, as needed, reference a Client table. The following DDL and SQL can be executed to create the base Client table.
Figure 3: Client Table Setup (see pdf)
Version Pattern 1
Now lets see how Version Pattern #1 will look in a physical database.
Figure 4: Version Pattern 1 (see pdf)
Version 1 Scenario 1.1
Figure 5: Scenario 1.1 (see pdf)
Version 1 Scenario 1.2
Figure 6: Scenario 1.2 (see pdf)
Version 1 Scenario 1.3
Figure 7: Scenario 1.3 (see pdf)
These scenarios show the physical data model for the first version pattern, which has no history or future dating requirements whatsoever, as well as insert, update, delete and retrieve SQL against that physical model. This is the easiest version pattern to update and query, but by the same token is the least expressive of temporal requirements. The date a row was created, and the date it was last updated, provide the only persisted temporal information about the data.
Relational databases and SQL standards have not yet evolved to easily support various temporal states. So, this series will continue to show you how to implement feature-rich version patterns into your database that could also become part of your enterprise architecture. These physical design segments will take this concept from theory to actual implementation. Its about time!
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access