Powell’s Technical Books in Portland, Oregon, is my favorite bookstore to “get lost” In. I can spend hours browsing all of the data modeling and database books.

About five years ago, I was browsing the shelves hoping to find one of my titles and instead found a first edition of “Data and Reality,” by William Kent. I first learned of this book from the bibliography in Graeme Simsion’s “Data Modeling Theory and Practice,” and it has been on my must-read list for some time, so I quickly made my way to the cashier, paid $40 for this 1978 classic, and later that evening started my read.

Remember 1978? The New York Yankees won the World Series, Montreal Canadiens took the Stanley Cup, and “You Light Up My Life” won the Oscar for best song. It was also a monumental year for technology. Illinois Bell Company introduced public tests for a cellular mobile phone system -- do you remember those “portable” large shoulder bags mobile phones first came in? This same year, the first computer bulletin board system was created. (I remember buying a used refrigerator off a BBS.) Also monumental this year, Space Invaders made its debut and the craze for video games began.

In addition to the start of the age for cell phones, online commerce and video games, 1978 was also a banner year for data management. The relational model had a big win over the hierarchical and network models: Oracle Version 1 was announced in 1978, written in assembly language and running on a whopping 128K of memory.

Important seeds were planted in 1978, and these seeds have grown into massive trees in the forms of the awesome technology we have today. From 20-pound cell phones to credit card-sized mobile phones, from bulletin board systems to 100,000 virtual stores, including Amazon and eBay, from one relational database option to many options — including columnar, XML and NOSQL databases — there is no doubt we have made amazing leaps in technology since 1978.

Therefore, you would expect anything technology-related from 1978 to be prehistoric, useless, and perhaps even laughable today. Not so with “Data and Reality.”

What strongly attracted me to the book was the large amount of material that is still directly relevant to us in data management today. It is a special book — not a how-to-do book on data management (such as how to normalize attributes or create a database), but a how-to-think book on data management. “Data and Reality” weaves the disciplines of psychology and philosophy gracefully with data management. Issues relative to how we perceived and managed information in 1978 were no different from how we think of information today. This book is technology-independent, and therefore timeless in its messages, regardless of whether we are a 1970s data processing expert or a modern day data analyst, data modeler, database administrator, or data architect.

For example, the section called “The Murderer and the Butler” explores how to reconcile the varying roles a person can play, which is a data modeling and architecture challenge many organizations face today. “At the beginning of a mystery, we need to think of the murderer and the butler as two distinct entities, collecting information about each of them separately. After we discover that ‘the butler did it,’ have we established that they are ‘the same entity’? Shall we require the modeling system to collapse their two representatives into one?”

As a data modeler, I learned some important data management principles from “Data and Reality.” In addition, I am reminded how many of the issues we tackle on our projects relate more to the ambiguity of information and have little or anything to do with technology.

As a publisher, I see an opportunity to make a classic available once again but this time to a whole new generation of analysts, modelers, database administrators, architects, and developers. Therefore, working closely with David Kent, William’s son (the author passed away in 2005), we worked through the text. The new third edition of “Data and Reality” differs in four main ways from the first and second editions:

  1. I have added commentary throughout the chapters. I did this to bring certain terminology up to date to help readers relate to Kent’s very important messages and to expand on Kent’s messages with my own messages and experiences. The overall goal of my commentary is to make sure readers get as much as they can from this book.
  2. I have added “Steve’s Takeaways” to the end of each chapter. These are the most important messages that I learned from each chapter. I found more than 100 important messages in this book. For example (one of the 100), each relationship should be named according to its underlying business reason.
  3. I removed several sections of the book that are less relevant today than in 1978, along with updating terms and references, and adding footnotes where appropriate.
  4. Several legends in our field have contributed to this third edition. There are endorsements from Joe Celko and John Zachman, a special note from Chris Date and the foreword from Graeme Simsion.

Here is an excerpt from Graeme Simsion’s foreword:"Kent was writing in 1978, when data modeling was a new discipline. His achievement at the time was to identify the areas in which we needed to develop theory and experience so that critical data modeling decisions could be rooted in something more than “individual preference.”

In the years since, hundreds of books and papers have been published on data modeling, and practitioners have accumulated a wealth of experience. So we should be well down the track towards replacing arbitrariness with soundly-based rules and guidelines. Unfortunately, this is not the case, at least not in the areas that “Data and Reality” focuses on. Nor have Kent’s concerns been further elaborated, or, for that matter, refuted.

While such fundamental issues remain unrecognized and unanswered, “Data and Reality,” with its lucid and compelling elucidation of the questions, needs to remain in print. I read the book as a database administrator in 1980, as a researcher in 2002, and just recently as the manuscript for the present edition. On each occasion, I found something more, and on each occasion I considered it the most important book I had read on data modeling. It has been on my recommended reading list forever. The first chapter, in particular, should be mandatory reading for anyone involved in data modeling.

In publishing this new edition, Steve Hoberman has not only ensured that one of the key books in the data modeling canon remains in print, but has added his own comments and up-to-date examples, which are likely to be helpful to those who have come to data modeling more recently. Don’t do any more data modeling work until you’ve read it.

I’ll end this article with a quote from Chapter 1, where William Kent cites and elaborates on a 1967 white paper on data where the author makes this statement: “We do not, it seems, have a very clear and commonly agreed upon set of notions about data — either what they are, how they should be fed and cared for, or their relation to the design of programming languages and operating systems.”

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access