Free Site RegistrationFree Site Registration

Sign up today and access Information Management on the web!
Your FREE registration entitles you to:

FREE email newsletters

FREE access to all Information Management content

FREE access to web seminars, resource portals, our white paper library and more!

Logical Tables, Physical Files and Flaws in Relational DBMSs: Fabian Pascal vs. the Original Source Material

InfoManagement Direct, December 2002

Tom Johnston

Editor's note: Tom Johnston writes a monthly column entitled "Modeling Matters" that appears the third week of each month on dataWarehouse.com. Click here to read some of his articles.

In a recent series of articles by Fabian Pascal, I was surprised to find statements repeating, in language nearly identical to my own, a critique of relational DBMSs which I made in two series of articles - in 1991 and in 1993 - and again this year in a third series of articles. I am glad that Pascal apparently agrees with me, but I would like to correct his failure to attribute this original and important critique to me. Footnote citations are required not only for direct quotations, but also for material which was the first publication of ideas which are original and important, and which are not yet common knowledge. Otherwise, without that attribution, the author is tacitly claiming those ideas for his own.

Advertisement

Well, do these conditions hold? And if they do, how accessible were my original articles? Were they published in obscure journals which a reasonable effort at scholarly research might legitimately have overlooked? Let's see.

Summary of My Critique of Commercial Relational DBMSs

The basic point of my critique is that relational DBMSs force us to denormalize databases because they require a one- to-one link between logical objects and physical objects, in particular, between logical entities and physical tables.

If RDBMS vendors, in conjunction with SQL standards committees, would remove this flaw, two major benefits would result. First, there would no longer be any reason to denormalize a database. No relational database would ever be denormalized, because the motivation for denormalization would be removed. Correcting this flaw would thus provide stability across all changes to the way data is physically stored. The normalized image of the database, presented to programmers and query writers, would remain completely unchanged even though extensive physical changes to the database were taking place. Across as many such changes as you please, code and queries would remain unaffected.

For example, with today's relational DBMSs, it is not possible to present a single Customer entity to the programmer and end-user SQL writer while, under the wraps, physically implementing that entity as multiple physical tables. But one might want to horizontally partition customer data into two or more physical tables to facilitate locality of reference or vertically partition customer data to cluster frequently referenced attributes into one table and infrequently referenced attributes into another table in order to improve performance against the frequently referenced data. Alone, or in combination with such partitioning techniques, one might also want to replicate one or more pieces of customer information across multiple physical tables.

It should be possible to do these things without changing what the programmer and query writer sees as the database and, therefore, without requiring queries or code to be changed. A relational DBMS which made this possible would, in the terms I used for the critique, fully distinguish the ANSI/SPARC Conceptual and Internal data layers, making possible non-one-to-one mappings between objects in the two layers. To fully distinguish these layers, SQL DDL would also have to be changed and augmented, permitting each layer to be separately defined and permitting mappings between the two layers to be defined as well. Recommended SQL DDL changes, as well as extensive illustrations of what databases managed by re-architected relational DBMSs would look like, are provided in all three of my series of articles.

The second major benefit of removing this architectural flaw from relational DBMS products is that doing so would increase the intelligibility of the image of the database presented to those who access it, thereby reducing errors in code and queries. This would happen because the only data structures made visible by the DBMS would be those which, through normalization, express the semantics of the data being stored in the database, clouded by nothing in the way of physical implementation details. Replication of customer name onto several physical tables, for example, would be completely hidden. No matter how extensive the replication, programmers and query writers would write SQL against a database in which customer name occurred only once, as an attribute of the Customer entity.

The Parallels Between Pascal's Statements and My Own

Several quotations from Pascal and me will establish that the parallels are indeed substantial. While the quotations all express variations on a single theme, note also that there is a striking similarity in the very phraseology used, not just in the points made. The key phrases are variations on "one-for-one correlation between logical entities and physical tables" and have been underlined in the lists of quotations below. These phrases concisely express the central architectural flaw in current relational DBMS products. This flaw is as true of these products today as it was a decade ago when I first pointed it out.

Now for the statements themselves. First, Pascal's statements.


Figure 1: Pascal on the Logical/Physical Confusion: 2002

Now for my own material, first from my 1991 series. Note that, in my publications, I have emphasized that the "one-for-one" flaw exists with respect not just to logical tables and rows vs. physical files and records, but also with respect to logical attributes vs. physical fields. The second quotation in Figure 2 expands on that point.

The third quotation in Figure 2, and the second in Figure 3, emphasize that if this architectural flaw were fixed, there would be no reason to ever again denormalize a database, a point made by Pascal in the last of the quotations listed in Figure 1. I made that point, and provided extensive illustrations of it, in my 1991, 1993 and 2002 series. That same third quotation puts the blame for denormalization squarely on the shoulders of the RDBMS vendors, a point Pascal makes, twelve years after I made it, in the second of his quotations in Figure 1.


Figure 2: Johnston on the Logical/Physical Confusion: 1991

Next, from my 1993 series. The first two quotations emphasize that the one-for-one correlation between the logical and the physical is not part of relational theory, but is entirely the result of a flawed implementation on the part of commercial RDBMS vendors. Pascal makes this same point, nine years after I did, in the second of his quotations listed in Figure 1.

The third quotation in Figure 3 briefly mentions various non-one-for-one mapping techniques that would provide performance equivalent to that gained by denormalization, but without denormalizing the logical database presented to programmers and query writers. In the articles themselves, those techniques are presented with extensive illustrations and explanation.

This, indeed, seems to be one of the themes expressed in Pascal's recent series in DM Review on "Dangerous Illusions", and especially in a follow-up article in DM Direct. [Pascal 06-2002, 07-2002, 10-2002.] My own position on such "dangerous illusions," as stated in the three series from which these quotations are taken, and as explained in greater detail in [Johnston, 06-2002], is that denormalization would never be necessary if relational DBMS products properly separated logical and physical data objects, but that it is in fact sometimes necessary in order to achieve desired performance.


Figure 3: Johnston on the Logical/Physical Confusion: 1993

And finally, from my 2002 series. Although the phrase "one-for-one mapping" between logical and physical objects is concise and catchy, I think that the second quotation in Figure 4 is the best expression of the problem that I have yet developed. It not only describes what is structurally wrong, but also points out that the mistake affects SQL statements, exposing them to changes in the physical database from which they should have been fully insulated.


Figure 4: Johnston on the Logical/Physical Confusion: 2002

On the basis of my 1991 and 1993 series of articles, I claim credit for being the first to point out this architectural flaw which is shared by all major relational DBMSs, for describing the implications and cost of that flaw, and for providing a solution to it. I therefore request that all future references to this original idea, by Pascal or anyone else, properly cite those two series of articles.

Is this Critique Important?

If this critique were not important, this discussion would be just a tempest in a teapot. However, I believe the critique is clearly important. For if relational DBMSs did fully separate the logical and physical images of a database, the cost savings in initial development and subsequent revisions of databases and their applications and queries is almost impossible to overestimate. It would amount to a significant percentage of the total IT budgets of all companies that use relational DBMSs - effectively of all companies with IT departments. It would mean that the only changes to databases that would affect code and queries would be those changes that directly reflect changes in business requirements, and that no changes for IT-internal reasons such as performance enhancement would affect code and queries in any way. I support this contention about cost savings in all three of my series.

Was it Original?

In 1991 and 1993, no one else had even mentioned the logical table/physical file architectural problem in any published material, let alone extensively developed the point as I had, together with a solution to it. This makes those ideas and their development original material. Indeed, except for Pascal's unelaborated statements, I believe the critique is still original, and that further work has not yet been published on it. While honest scholarship - Pascal's or mine - can always be incomplete, my own claim to originality is based on a good knowledge of what had been published in the IT trade press at that time and also on a recent search of academic literature in the ACM's Digital Library. That search turned up no material expressing this critique of relational DBMS products, either before or after my 1991 and 1993 series were published.

Is it Now Common Knowledge?

If this critique of today's relational DBMS products were now common knowledge, expressed or alluded to in numerous publications, then it would not be as necessary to cite the original source of the critique (although it would still be courteous to do so). However, as I indicated above, my research shows that the point is far from being common knowledge.

Indeed, even among academics, one frequently hears unqualified remarks to the effect that relational DBMSs are an advance over their predecessors because they provide a logical view of data rather than a physical one. For example, Bertino, Catania and Zarri say that "Users should be able to model data without considering the physical structure but rather based on their perception of the application." I agree completely. But then they go on to say that, "The relational model ..... (provides) a logical representation of data that is independent from the physical structure." [Bertino, Catania, Zarri, 2001, p.15.] Here I disagree. Although there are some respects in which this is true, the failure to break the one-for-one coupling of logical entities to physical tables and logical attributes to physical columns is a major way in which relational products still tightly bind the logical to the physical. Publications I have reviewed all fail to make this point. Thus, I conclude that my critique is far from being common knowledge.

Was My Critique Extensively Developed in the Original Material?

My first series of articles on this topic totaled 6,000 words and thirteen illustrations, the second 12,000 words and five illustrations, and the third 12,000 words and thirteen illustrations. There is, however, a considerable amount of overlap across the three series.

Nonetheless, the quotations listed in Figures 2, 3 and 4 support my contention that the critique was extensively developed in all three of my series of articles. In addition, the third series is still available online (in the articles archives under the iKnowledge tab at www.dataWarehouse.com) for anyone who wants to compare the relative extents to which I and Pascal have developed the critique. And while the first two series are not available online, I will be glad to mail a copy to anyone who contacts me requesting one (tjohnston@acm.org).

These series, of course, have intrinsic interest on their own. If anyone has found Pascal's statements on this topic suggestive and interesting, they will find in my articles a full and detailed development of the critique, together with extensive illustrations of specific case studies and also a solution to the problem.

How Accessible were My Original Articles?

The 1991 and 1993 series were each published in a major periodical. One was Codd and Date's Relational Journal. Indeed, the inaugural article in that series was published in the same issue as an article by C. J. Date. The other was Database Programming and Design, and once again Date was publishing in the same issues of that magazine as the issues my articles appeared in. So I find it hard to believe that Date himself was unaware of those series.

Given the close association between Date and Pascal (evident in Date's prominence on Pascal's Web site www.dbdebunking.com), I find it surprising that Pascal could present the same conclusions that I reached in those series, and could do so several times and in much the same language that I used, without any knowledge of those five articles. But regardless of the Pascal/Date connection, those articles were published in major periodicals, and a reasonable effort at scholarly research should have found them.

Why is this Issue About Authorship Important?

Let me briefly point out why this issue of proper attribution to original sources is important. First of all, it is only fair to give credit where it is due. But more than that, for those of us who would like to publish the occasional insights we believe ourselves to have, but who do not have the authorial weight of "big names" like Pascal and Date, it is if anything even more important that we be treated fairly, precisely because of that disparity. I am a published author on a much smaller scale than Pascal or Date. But I, and other less well-published authors, should be able to feel confident that justice will be done, regardless of how famous others are who might seek to appropriate our work for their own, or who might, to their own advantage, happen to overlook our contributions.

A Word on Personal Comments in Professional Publications

Although Pascal has not alluded to me in any of his publications, so far as I know, let me attempt to forestall his tendency to mix specific points with personal characterizations such as "just does not get it," "this is astounding." and "I do recall sending that article for review to Chris Date ..... and his reaction was identical. He sent it back with the comment 'Life is too short..'" [Pascal, 10-2002.] I believe that many readers will, like myself, find the frequent use of such personal remarks at least distracting, clearly inappropriate by any reasonable standards of professional authorship and sometimes downright distasteful.

I have indeed been subjected to such remarks before, from no less a personage than Pascal's close associate, Chris Date himself. For example, in an exchange of several articles on the topic of relational DBMSs and multi-valued logic, Date said that various statements of mine were "specious" - an adjective which disparaged me as an author while adding nothing of substance to his argument. Later, he called what he claims was an error of mine "a howler," which is a Briticism for a really stupid mistake. [Date, 09-1995, reprinted in Date, 1998.]

David McGovern added, in that same article, that "Every elementary schoolchild studying simple set theory learns" that two-valued logic can easily lead to errors of misinterpretation - one of my principal points contra Date's objections to multi-valued logic. Do we really need to deprecate my point with "every elementary schoolchild"? I felt that such personal, derogatory remarks were out of place in a professional debate on an important topic, and for my part never reciprocated the name calling.

But the personal remarks in this exchange on the topic of multi-valued logic got even nastier. In 1998, Date published a collection of his writings which contained a chapter entitled "Up to a Point, Lord Copper!" [Date, 1998, Ch. 10]. This is a reference to a minor aristocrat, in an obscure novel by Evelyn Waugh.1 Lord Copper was given to making quite stupid remarks. In reply, his partner in conversation would, with restraint, say only "Up to a point, Lord Copper!" The chapter in Date's book was about my position on the matter of multi-valued logic vs. Date's preferred "default values" position, and so there is little doubt at whom the allusion was aimed. Moreover, since I have passed doctoral level examinations in formal logic, in the course of earning a doctorate in Philosophy, I suggest that whether or not my position was right, it is unlikely that it was stupid.

Such personal allusions, as impressively literary as they may be, are clearly inappropriate in any professional publication. They say more of an unflattering nature about their author than about the person alluded to.

Moreover, given a pattern of such behavior, it is remarkable to find Date saying, in response to an apparently factual statement by Joe Celko, that "This quote looks dangerously close to being an ad hominem attack, but perhaps I'm being oversensitive ..." [Date, 02-2001.] I find it difficult to understand how someone who pointedly compared another author to Waugh's Lord Copper could later adopt an air of restrained regret at the mere hint of an ad hominem remark directed at him. Perhaps, for Date, consistency is the hobgoblin of little minds. Perhaps he believes that great authors are exempt from the standards that apply to the rest of us. Or perhaps there is another explanation, and Date will tell us what it is.

In Conclusion

I have never written a piece like this one before - focused on issues of authorial integrity - and I hope I never need to write another such. But although I am a minor author, my critique of relational DBMS products was original, detailed and printed in major publications more than a decade ago. In addition, I believe the critique is clearly important. (Indeed, according to Pascal, technology has recently been developed which is based on it.) Nonetheless, that critique is now being expressed by Pascal, without attribution to me.

Nor is it out of place to suggest that, as important as they are, authors such as Date, Pascal and others associated with Date try to write in a more professional manner. As for Date himself, his penchant for personal invective goes back many years and has been aimed at a number of different authors. I think that, at least once, someone should take him to task for it.

Let those of us who assume the mantle of authors accept the responsibility to be informed on what we write about and to hold ourselves to high standards of authorial integrity, attributing to others the points which were first thoroughly developed by them, especially if those points are important and not yet commonplace. And let us stick to making our points and refrain from characterizing our protagonists as dimwitted bumblers.

In the meantime, let's be clear about who first developed this critique of relational DBMSs. It was me. It was not Fabian Pascal, Chris Date or anyone else. And once again, if anyone wants proof of my assertion or wants to understand the critique in far more detail than Pascal provides, my online series is available at dataWarehouse.com. In addition, I will be glad to mail copies of my two earlier series to whomever requests them.

Postscript

As is the right of authors whose publications are criticized, Chris Date and Fabian Pascal will reply to this article of mine, and that will be the end of the matter as far as publication in DM Direct is concerned. Let me simply say that after reading an advanced copy of their reply, I conclude that all of my points still stand and I expect that anyone who reads Pascal's articles and the other material he and Date allude to, and then reads my three series of articles will agree with me that the differences are substantive and the critique important.

Nonetheless, because of what I consider to be the many misleading statements in their replies, I have written a rebuttal to them. Since I cannot publish that rebuttal here, I offer to send it, along with copies of my two original series, to anyone who writes me requesting that material. Once again, readers can contact me at tjohnston@acm.org.

References:

1. Waugh, Evelyn. Scoop. Little, Brown & Co. 1999. (Originally published in 1938.)

Bibliography

Where articles are cited in the text, they are cited by author and date. Those articles are listed with the author/date in brackets, at the front of the entry.

My 1991 Series

[Johnston, 10-1991.] "Architectural Flaws in Current RDBMSs: Cracks in the Foundations". Database Programming and Design, October, 1991 (vol. 4, #10), pp. 50-53.

[Johnston, 11- 1991.] "Architectural Flaws in Current RDBMSs: Rebuilding the Foundations". Database Programming and Design, November, 1991 (vol. 4, #11), pp. 53- 61.

My 1993 Series

[Johnston, 01- 1993.] "Eliminating Denormalization. Part I." The Relational Journal, December/January 1993 (vol. 4, #6), pp. 1, 8-11.

[Johnston, 02-1993.] "Eliminating Denormalization. Part II." The Relational Journal, February/March 1993 (vol. 5, #1), pp. 3-8.

[Johnston, 04-1993.] "Eliminating Denormalization. Part III." The Relational Journal, April/May 1993 (v. 5, #2), pp. 3- 8.

My 2002 Series

"Modeling Matters: How RDBMSs Could Make Denormalization Unnecessary." DataWarehouse.com, 7/12/02.
http://www.datawarehouse.com/iknowledge/articles/article.cfm? ContentID=2803

"Modeling Matters: How RDBMSs Could Make Denormalization Unnecessary: Part 2". DataWarehouse.com, 8/9/02.
http://www.datawarehouse.com/iknowledge/articles/article.cfm? ContentID=3025

[Johnston, 09-2002.] "Modeling Matters: How RDBMSs Could Make Denormalization Unnecessary, Part 3". DataWarehouse.com, 09/06/02.
http://www.datawarehouse.com/iknowledge/articles/article.cfm? ContentID=3109

Publications by Pascal and Date

[Date, 09-1995.] C. J. Date, Hugh Darwen, David McGoveran. "Nothing to Do With the Case". Database Programming and Design, September, 1995 (vol. 8, #9), pp. 45-52.

[Date, 1998.] C. J. Date, Relational Database Writings, 1994-1997, (Addison Wesley Longman, 1998).

[Date, 02-2001.] "THERE'S ONLY ONE RELATIONAL MODEL!" dbdebunk.com, 9/8/2002.

[Pascal, 05-2002.] "The Logical- Physical Confusion". Journal of Conceptual Modeling, www.inconcept.com/jcm, May, 2002 (Issue 25).

[Pascal, 06-2002.] "The Dangerous Illusion: Denormalization, Performance and Integrity, Part 1". DM Review, June 2002.

[Pascal, 07-2002.] "The Dangerous Illusion: Denormalization, Performance and Integrity, Part 2". DM Review, July 2002.

[Pascal, 09-2002.] "On Normalization, Performance and Integrity" dbdebunk.com, 9/8/2002.

[Pascal, 10-2002.] "No Value in Multi-Value". DM Direct, at dmreview.com, October 2002.

Other Cited Publications

[Bertino, Catania, Zarri, 2001.] Elisa Bertino, Barbara Catania, Gian Piero Zarri. Intelligent Database Systems, (Addison-Wesley, 2001).

[Johnston, 06-2002.] "Modeling Matters: Logical Data Models, Denormalization and Conceptual Clarification." DataWarehouse.com, 6/7/02.
http://www.datawarehouse.com/iknowledge/articles/article.cfm? ContentID=2686

Editor's note: Please see Fabian Pascal's rebuttal piece "Much Ado about Very Little" or click on http://www.dmreview.com/article_sub.cfm?articleId=6125. Chris Date and Hugh Darwin also wrote an article in response to this column: http://www.dmreview.com/article_sub.cfm?articleId=6126.

Tom Johnston is an independent consultant specializing in enterprise data architecture, and in relational, object-oriented and data warehouse modeling in various industries, including telecommunications, health care, banking, retailing and transportation. He can be reached at tjohnston@acm.org, and his Web site is www.MindfulData.org.

For more information on related topics, visit the following channels:

Advertisement

Advertisement