When I was asked to review the third edition of Data Modeling Essentials, I jumped at the opportunity. I continuously recommend the second edition to attendees in my data modeling courses, and Graeme Simsion has been a frequent participant in our Design Challenges (please see www.stevehoberman.com/designchalleng.htm). In addition, I have attended wonderful presentations both by Simsion and Witt over the years. Therefore, I dived into this latest edition with very high expectations that were not only met, but also exceeded.
The goal of Data Modeling Essentials is to "... help information systems professionals ... acquire competency in data modeling." The reader might question whether the text is more introductory or more advanced, and the answer is "Yes." The book will offer value to novice, intermediate and advanced data modelers. Someone just starting his/her first data modeling project could begin at page one and by the end of the text have gained a rich understanding of conceptual, logical and physical designs, as well as pick up a number of key themes in modeling such as creativity, exceptions to most rules and that there is usually more than one right answer. Intermediate and advanced modelers will find the introductory chapters on modeling and the key themes useful and will also benefit from the sections on modeling approach and more advanced topics such as rules and advanced normalization.
By real world, I mean that the focus of the book is on modeling in the practical sense and not the more theoretical topics such as mathematical set theory or formal methodology. For example, the normalization topic starts with a common data problem involving redundancy, and the problem is addressed step by step through levels of normalization. The theme throughout the text is that there can be more than one right answer and creativity is a large part of the modeling process. This is demonstrated throughout the text with examples highlighting Simsion and Witt's own vast collection of real world experiences. I personally found the prescriptive/descriptive discussion on analysis and design very enlightening.
The book is extremely well-written. It is humorous at times, full of useful anecdotes and follows a very logical (no pun intended!) sequence. The chapter on business rules, for example, clearly and concisely explains in detail which types of rules can be captured on the model and which cannot. I have heard both Simsion and Witt present at conferences and this book is written in a similar matter to their clear and engaging speaking styles.
The book contains 17 chapters. Part I comprises the first 7 chapters and provides the foundation for data modeling. If you are new to modeling or need a refresher, it is worth reading each chapter in this section. More advanced modelers might pick up tips in this first section on how to explain modeling to others. The use of spreadsheets to show the complexities of modeling, for example, is a great way to introduce folks to the boxes and lines of a data model. Part II contains 5 chapters and is new to this edition. If you own an earlier edition of the text, you might consider upgrading to get the chapters on the different approaches to building models. Part II is not a detailed reference of different modeling methodologies, but instead contains principles and guidelines that can be applied to complete the modeling deliverables. Part III contains 5 chapters of more advanced topics from advanced normalization to business rules to data warehousing.
Chapter 1, "What is Data Modeling?," is an excellent chapter for someone completely new to data modeling. Even if your only exposure to the world of data is through spreadsheets, by the end of the chapter, you will be able to answer the set of questions posed in the introductory paragraph, starting with "What is a data model?" It is essential for every modeler to understand the discussion on how analysis differs from design in this first chapter. I really love the descriptive versus prescriptive analogy used here. In this chapter, I also like the use of tangible words to describe the purpose of modeling. Words such as "leverage" and "stability" really help the reader picture the benefits clearly.
Chapter 2, "Basics of Sound Structure," focuses mainly on normalization. An easy to follow example is provided (again in spreadsheet format). First, the underlying premises behind normalization are discussed, such as one fact per column, and then each level of normalization up to and including Third Normal Form (3NF) is explored (levels higher than 3NF are discussed in Chapter 13). Continued in this chapter (and most of the book) is the belief that data modeling is a fairly creative process.
Chapter 3, "The Entity-Relationship Approach," focuses on the components of data models and the top-down approach to building a model. The term "entity class" is used to represent the class of things and "entity" the value of the class. Therefore, "customer" would be the entity class and "Bob" could be the entity. Basic naming rules for entities are discussed. There were some neat examples in keeping entity names singular that most of us may not catch, such as Transaction History and Visiting Schedule, which are really plural entity names. Different types of relationships are discussed in detail, as is the concept of transferability, which is a topic even the more advanced modelers sometimes gloss over. The chapter concludes with a discussion on attributes.
Chapter 4, "Subtypes and Supertypes," explains this concept and shows exhaustive and overlapping subtypes in different modeling notations. An important tip is that not everything that smells like a supertype is a supertype, an example being the unlikeliness that customer order and supplier order will be supertyped under order, because they are very different concepts in the eyes off the business.
Chapter 5, "Attributes and Columns," talks about attributes and categories for grouping attributes to facilitate naming and domain definitions. There is a very interesting section on generalizing attributes, including a discussion on the tradeoffs of making things too generic.
Chapter 6, "Primary Keys and Identity," talks about the approach and issues associated with selecting the primary key. Also discussed are surrogate and structured keys. A very thorough set of criteria (unique, minimal and stable) is discussed as requirements for a good primary key.
Chapter 7, "Extensions and Alternatives," takes us beyond the standard set of tools the traditional entity-relationship model offers. It begins by talking about additional ways to capture more information on attributes, followed by a discussion of alternate modeling languages, including Chen, UML and Object Role Modeling. This chapter's purpose is not to make us experts in these other modeling approaches, but rather to make us aware that they exist and touch on their strengths and weaknesses.
Chapter 8, "Organizing the Data Modeling Task," focuses on issues that might come up during the modeling tasks. My two favorite issues discussed in this chapter are "Why include a formal data modeling task in the project plan anyway?" and challenges with getting access to the right people who provide input on the models, such as users and other business stakeholders. Methodology is not discussed at length in the chapter, but instead the chapter covers common ingredients to any methodology (roles and responsibilities, and partitioning and maintaining the model).
Chapter 9, "The Business Requirements," talks about interviewing businesspeople to extract the application requirements, and focuses on a specific technique: Object Class Hierarchies. This is a really neat technique for top-down analysis where first you start with something extremely high level (e.g., party) and then think of the concepts immediately underneath this level, and so on, until you have a representative set of entity classes.
Chapter 10, "Conceptual Data Modeling," covers very important topics. It first focuses on the patterns modelers tend to reuse as they tackle new logical data modeling scenarios. This chapter also explains how to use business assertions to validate a model with a user. There are several techniques discussed for how to start modeling, including the use of generic models, and top-down and bottom-up modeling. I really enjoyed reading the section titled "The Right Attitude."
Chapter 11, "Logical Database Design," talks about what must be done to the Conceptual Data Model from Chapter 10 to make it work in a relational database. Discussed in this chapter are subtype implementation techniques and primary and foreign key definition that will make it possible to create a database from a fully normalized logical data model.
Chapter 12, "Physical Database Design," focuses on considerations around performance, storage space, and backup and recovery that might require modifications to the logical database design. Techniques such as indexing, denormalizing and partitioning are discussed. Many modelers might choose denormalizing believing it is the only and best technique for improving retrieval performance. This chapter supports a point I make repeatedly during my training courses: be very selective when denormalizing.
Chapter 13, "Advanced Normalization," explains Boyce-Codd Normal Form, Fourth Normal Form and Fifth Normal Form. Simple examples are used to convey these higher levels, and the chapter concludes with several issues related to normalization such as overlapping tables, which touches on redundancy of data.
Chapter 14, "Modeling Business Rules," provides an in-depth discussion on business rules including the two main categories that rules fall into (data and process). I found the paragraph in Section 14.2.3 so useful that I read it twice. This was the clearest and most succinct summary I have read to date as to which rules are relevant to the data modeler.
Chapter 15, "Time-Dependent Data," describes the options available to the modeler for capturing time-based information. A number of techniques including auditing and snapshots are discussed. Also discussed are several tricky situations modelers might find themselves in related to time, such as handling deletions and temporary rules.
Chapter 16, "Modeling for Data Warehouses and Data Marts," talks about the characteristics of warehouses and marts, and how these impact the resulting data model. Many data warehousing terms are defined here, including star schema and slowly changing dimensions.
Chapter 17, "Enterprise Data Models and Data Management," explains what an enterprise model is and some of the problems organizations have run into when building one. Consistent with the rest of the book, not just the "what" is explained, but also "how" to build one.
In summary, I found Data Modeling Essentials, Third Edition, very useful for data modelers at any level of experience.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access