DEC 1, 2002 1:00am ET

Related Links

Visiting Nurse Service Cares About Cloud Security
October 25, 2011
Light at the End of the Silo
October 28, 2010
Pitney Bowes Releases Enhancements to MapInfo Professional
September 13, 2010

Web Seminars

Getting Started with Big Data
Available On Demand
Transactions & Interaction: The Correlation of Structured and Unstructured Data
Available On Demand
Deliver Better Enterprise Data through Better Reference Data Management
Available On Demand

Logical Tables, Physical Files and Flaws in Relational DBMSs: Fabian Pascal vs. the Original Source Material

Print
Reprints
Email

Editor's note: Tom Johnston writes a monthly column entitled "Modeling Matters" that appears the third week of each month on dataWarehouse.com. Click here to read some of his articles.

In a recent series of articles by Fabian Pascal, I was surprised to find statements repeating, in language nearly identical to my own, a critique of relational DBMSs which I made in two series of articles - in 1991 and in 1993 - and again this year in a third series of articles. I am glad that Pascal apparently agrees with me, but I would like to correct his failure to attribute this original and important critique to me. Footnote citations are required not only for direct quotations, but also for material which was the first publication of ideas which are original and important, and which are not yet common knowledge. Otherwise, without that attribution, the author is tacitly claiming those ideas for his own.

Well, do these conditions hold? And if they do, how accessible were my original articles? Were they published in obscure journals which a reasonable effort at scholarly research might legitimately have overlooked? Let's see.

Summary of My Critique of Commercial Relational DBMSs

The basic point of my critique is that relational DBMSs force us to denormalize databases because they require a one- to-one link between logical objects and physical objects, in particular, between logical entities and physical tables.

If RDBMS vendors, in conjunction with SQL standards committees, would remove this flaw, two major benefits would result. First, there would no longer be any reason to denormalize a database. No relational database would ever be denormalized, because the motivation for denormalization would be removed. Correcting this flaw would thus provide stability across all changes to the way data is physically stored. The normalized image of the database, presented to programmers and query writers, would remain completely unchanged even though extensive physical changes to the database were taking place. Across as many such changes as you please, code and queries would remain unaffected.

For example, with today's relational DBMSs, it is not possible to present a single Customer entity to the programmer and end-user SQL writer while, under the wraps, physically implementing that entity as multiple physical tables. But one might want to horizontally partition customer data into two or more physical tables to facilitate locality of reference or vertically partition customer data to cluster frequently referenced attributes into one table and infrequently referenced attributes into another table in order to improve performance against the frequently referenced data. Alone, or in combination with such partitioning techniques, one might also want to replicate one or more pieces of customer information across multiple physical tables.

It should be possible to do these things without changing what the programmer and query writer sees as the database and, therefore, without requiring queries or code to be changed. A relational DBMS which made this possible would, in the terms I used for the critique, fully distinguish the ANSI/SPARC Conceptual and Internal data layers, making possible non-one-to-one mappings between objects in the two layers. To fully distinguish these layers, SQL DDL would also have to be changed and augmented, permitting each layer to be separately defined and permitting mappings between the two layers to be defined as well. Recommended SQL DDL changes, as well as extensive illustrations of what databases managed by re-architected relational DBMSs would look like, are provided in all three of my series of articles.

The second major benefit of removing this architectural flaw from relational DBMS products is that doing so would increase the intelligibility of the image of the database presented to those who access it, thereby reducing errors in code and queries. This would happen because the only data structures made visible by the DBMS would be those which, through normalization, express the semantics of the data being stored in the database, clouded by nothing in the way of physical implementation details. Replication of customer name onto several physical tables, for example, would be completely hidden. No matter how extensive the replication, programmers and query writers would write SQL against a database in which customer name occurred only once, as an attribute of the Customer entity.

The Parallels Between Pascal's Statements and My Own

Several quotations from Pascal and me will establish that the parallels are indeed substantial. While the quotations all express variations on a single theme, note also that there is a striking similarity in the very phraseology used, not just in the points made. The key phrases are variations on "one-for-one correlation between logical entities and physical tables" and have been underlined in the lists of quotations below. These phrases concisely express the central architectural flaw in current relational DBMS products. This flaw is as true of these products today as it was a decade ago when I first pointed it out.

Now for the statements themselves. First, Pascal's statements.


Figure 1: Pascal on the Logical/Physical Confusion: 2002

Now for my own material, first from my 1991 series. Note that, in my publications, I have emphasized that the "one-for-one" flaw exists with respect not just to logical tables and rows vs. physical files and records, but also with respect to logical attributes vs. physical fields. The second quotation in Figure 2 expands on that point.

The third quotation in Figure 2, and the second in Figure 3, emphasize that if this architectural flaw were fixed, there would be no reason to ever again denormalize a database, a point made by Pascal in the last of the quotations listed in Figure 1. I made that point, and provided extensive illustrations of it, in my 1991, 1993 and 2002 series. That same third quotation puts the blame for denormalization squarely on the shoulders of the RDBMS vendors, a point Pascal makes, twelve years after I made it, in the second of his quotations in Figure 1.

Filed under:

Advertisement

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.