I have to get one thing out in the open right away - I am a Steve Hoberman fan. After hearing him speak at conferences and user group meetings, I recommended we invite him to give his Data Modeling Master Class at my company. Each time Ive attended one of his presentations, my expectations were met and usually exceeded. Hes an interesting and fun speaker who keeps it simple and engages his audience throughout his presentations. I also have a copy of his Data Modelers Workbench, which I consult on occasion. So it was with great anticipation that I began reading his new work, Data Modeling Made Simple. And again, I wasnt disappointed.
The audience for this book is primarily business and IT professionals with little to no experience with data models. It focuses on the basics, leaving theory, history and more advanced topics to his first book. How simple is this book? I asked my 76 year old mother, whos computer skills are limited to MS Word, email, AOL and Google, to read a little of it. She was able to understand the first couple of chapters with a little glossary-type help like Whats an application? For IT-savvy readers, it will be an easy read that makes the concepts simple to understand and provides tips and tricks that can be put to practical use immediately.
Steve writes in an easygoing, first person style. The text is peppered with personal anecdotes that illustrate his points very successfully. And, in sharing those anecdotes, he makes readers feel confident enough to try, knowing that the experienced person who wrote this book overcame challenges and mistakes similar to those they may encounter. Throughout the text, he provides exercises designed to get the reader thinking, with references to his Web site for insight into his own thoughts about the questions he has posed. In 12 chapters, the book progresses logically from defining what a data model represents to explaining entities, data elements and relationships, to the types of models, normalization and the physical data model. He then goes further to discuss approaches to building a model and validation techniques. All in all providing a comprehensive first look at data models for those who are not modelers and a quick reference for less experienced modelers.
Chapter 1: What is a data model? introduces the concept of a model with a story about getting lost driving in France. Because neither the author nor the gas station attendant he approached for directions spoke the same language, the attendant drew a map. This map was a model containing common symbols that guided him to his destination. With this analogy, the book launches into defining data model. The authors definition is: A data model is a diagram that uses text and symbols to represent groupings of data so that the reader can understand the actual data better. Examples follow, including an introduction to the example that he uses throughout the text for illustrative purposes, a business card.
Chapter 2: What is so special about data models? describes how data models can facilitate communication at several different levels. This is an important chapter for anyone who needs to justify a data modeling function because it succinctly and clearly illustrates the value of the process of developing a data model and how the model can be used after the modeling process to convey information about the business. The chapter draws a distinction between formalization, the single, precise interpretation of the symbols used on the model, and the potentially imprecise nature of the data and business rules that are represented on the diagram - particularly if the terms used in the model are not clearly and completely defined. A scope cube is introduced as a means of helping to define the time frame, function and area to be modeled, and the authors definition of three types of models - subject area, logical and physical - are clearly described.
Chapter 3: What are entities? defines an entity as a collection of information about something that the business deems important and worthy of capture and provides examples of the different types of real-world things that are entities. The appropriate level of entity to be used in the three different types of models is presented, but the book is quick to point out that the industry or business being modeled actually dictates the appropriate level of entity to be used. For example, an entity with lots of detail about a phone number might be appropriate for a telecommunications company, but for most of us, a simple phone number and perhaps the role the phone number plays in the context of a customer is appropriate. The different types of entities, independent, dependent, attributive and supertype/subtype, are described, and simple figures are provided to illustrate each type. There are four exercises in the seven pages of this chapter - an indication of the importance of understanding this fundamental concept.
Chapter 4: What are data elements? is another brief but very important chapter. It talks about what a data element is and how it must be deemed important by the business to be documented in a data model. The concepts of domain and keys are introduced here. Domains are used to specify the complete set of all possible values the data element may hold. Keys are described as partly or fully identifying an entity instance accompanied by a good illustration. The types of keys and when to use them are explained. The term data element is used throughout the text for consistency, but the book notes that there are other names that are more appropriate in different contexts, such as attribute or column.
Chapter 5: What are relationships? articulates what a relationship is by using very clear, simple examples and diagrams along with explanatory text. Structural and referential integrity rules between two entities are contrasted with action rules, which are usually not represented on a data model. Using IE notation, the chapter presents cardinality concepts and the symbols used to represent cardinality on a diagram, recursive relationships, identifying and non-identifying relationships, and relationship labels. Experienced modelers know the value of relationship labels, but when pressed for time, theyre the first things to go. Throughout the book, each relationship is carefully named to subtly illustrate the important role the label plays in specifying business rules on the model.
Chapter 6: What makes a definition great? tackles the bane of every data modelers existence. Its unlikely theres an IT professional alive who doesnt know the importance of documentation, but it is often difficult to find business or system requesters/users who want to take the time to provide it. This very brief chapter uses the simple data model of one Product (entity) appears on (relationship) zero, one or many Order Line (entity) to illustrate the importance of a good definition. For example, does product include raw materials and intermediate goods or only finished items? Is service a product? Are closed or cancelled order lines included in Order Line? Good definitions remove ambiguity from the model and assist both technology and business professionals in making intelligent decisions. This brief chapter refers readers to a more in-depth discussion in The Data Modelers Workbench (also by Steve Hoberman), but it does describe three critical components of a good definition - clarity, completeness and accuracy. The one thing I would like to see added to the chapter in a future edition is an example of a bad definition contrasted with a revised, great definition.
Chapter 7: What is the subject area model? presents some interesting perspectives on the high-level conceptual or business model. High-level, conceptual and business are somewhat subjective, but a simple example makes it very clear what they mean to the author. He takes it a step further by suggesting that there are three types of subject area models - the business subject area model (BSAM), the application subject area model (ASAM) and the comparison subject area model (CSAM). The scope of a BSAM model is a defined portion of the business as narrow or broad as necessary. The ASAM covers a defined portion of a particular application and is frequently a subset of the BSAM. To take it even further, ASAMs come in two flavors - operational and reporting. The CSAM is primarily used to perform gap analysis, integration analysis and help estimate work effort. It is usually based on two or more ASAMs.
Chapter 8: What is the logical data model? is the heart of the book and it takes all the time (pages) it needs to explain what a logical model is, first through third normal form, multi-valued data elements and abstraction. The logical model is described as a representation of the rules that govern the way in which something works as well as a means of communicating all the data elements within the scope of the project independent of technology. The chapter is chock full of examples, exercises and some personal experiences that can very successfully bring the reader to complete understanding.
Chapter 9: What is the physical data model? describes five denormalization techniques - standard, repeating groups, repeating data elements, FUBES (fold up but easily separated) and summarization, thus covering both operational and data warehouse performance needs. The chapter describes the advantages and disadvantages of each and notes that data usage and the type of platform on which the database is implemented are major factors in determining which technique(s) should be applied. Again, there are lots of examples. Rounding out the chapter are discussions on surrogate keys, indexes, partitioning, the use of views and dimensionality.
Chapter 10: What is the best approach to building the models? helps newer modelers figure out where to begin. The author provides two useful equations that will help any modeler decide on an appropriate approach. The first, purpose + audience = deliverables provides focus on the needs of the audience and what should be modeled for them. The second equation, deliverables + resources + time = approach incorporates time and resource constraints into consideration of the approach that will produce the deliverables needed. Top down and bottom up approaches are also discussed.
Chapter 11: How do I validate a data model? introduces the data model scorecard and 10 categories on which to score a model. By applying the rigor necessary to do well on the scorecard, a newer modeler will be developing habits that will stand him/her in good stead wherever their modeling career leads them. The ten categories are weighted based on whats important in each individuals work environment and provide a consistent way of evaluating the condition of the model. These are good basic questions that have been assembled into an easy to use way of ranking the quality of data models.
Chapter 12: Top three most frequently asked modeling questions briefly addresses keeping modeling skills sharp, the best modeling tool and the future role of the data modeler. Of course, there are no hard and fast answers, so the author provides some perspective and food for thought.
To summarize, Data Modeling Made Simple neatly presents the fundamentals of data modeling in a concise, well-illustrated book. Clarity of style, abundant examples and exercises, and unique perspectives demonstrate how data modeling can be kept fairly simple and understandable. This is an excellent text for new and inexperienced modelers. It guides them into good habits and practices and provides excellent tools for evaluating the quality of the work produced. This quick read will be appreciated by any audience - from students to IT management to the business users of the final database design.