Continue in 2 seconds

What Exactly IS a Data Model?

Published
  • February 01 2003, 1:00am EST
More in

What exactly is a data model, anyway? While many of us in DAMA are involved with data modeling, the problem is that there are almost as many different views of what that means as there are people in DAMA. The time has come to establish some basic definitions, and this is the first in a series of three articles attempting to do just that.

For the short answer to the question, a data model can:

  • Represent the language of an organization.
  • Represent the fundamental structure of an organization.
  • Represent data structures as manipulated by a particular technology.

These are very different uses for what is essentially the same graphic ­ a collection of boxes and lines – as we shall see.
The graphics associated with the term "data model" are usually boxes (although some use circles), each representing something, along with lines connecting them, each representing a relationship among two or more "somethings." The question is: What are the somethings? That is, what are the "things" to be represented? There are many specific notations available; but structurally they are all very similar. Each notation does emphasize something different; but even when different people use the same notation, they often use the same symbols to mean different things.

These three articles will address conceptual, logical and physical models, object models and data model views. The goal of this series is to provide a clear vision of how all these elements relate to each other.

First of all, what is it we are trying to represent? This article will present the kinds of data models available and then will describe in more detail the models to be discussed with business owners. Subsequent articles will describe the other kinds of models in detail.

The Three- (or Four-) Schema Architecture

In 1975, a committee in the Computer and Business Equipment Association (commonly known as ANSI/SPARC) identified and defined a complete set of schema types characterizing the structure of data. A draft of this report was finally published in 1978, and the 42 schema types were collapsed to just three.1,2 This then became known as the "Three-Schema Architecture," although, by convention, a fourth schema has been added.

Figure 1 illustrates how each person in a company looks at data according to an external schema. Note that while each is different, they may overlap, often using the same terms of reference (although even these terms may be defined differently). Note that those whose views are reflected in this figure may be either creators or consumers of data – or both.


Figure 1: Three-Schema Architecture

The conceptual schema combines these different external views into a single, coherent definition of the enterprise's data. In this view, each data element is defined only once for the organization, and its relationship to all other data elements is clearly defined as well. Each external schema may consist of a selection of these elements, but the underlying definitions are (or should be) consistent across them all. To the extent that definitions are not consistent and cannot be made to be so, these inconsistencies must be documented.

An internal schema is an organization of data according to the technology being used to record it. This includes the terms for components recognized by each kind of data manipulation technology: relational database management system (DBMS) "tables," hierarchical DBMS "segments," object "classes," etc. It also includes the terms for the internal physical storage of data on the computer (cylinder, track, etc.). In the past, the DBMS terms have comprised the logical schema, and the physical storage terms have comprised the physical schema.

Types of Data Models

The ANSI schemata give us a clue as to what kind of data models we can draw. Among other things, they correspond very nicely to the rows in John Zachman's Framework for Information Architecture or at least they correspond to the variation on Zachman's Framework that I call simply the "Architecture Framework."3,4

The external schema is really a business owner's view – Row Two of the Framework. That is, an external data model portrays what is seen by the people who actually run an enterprise. It is in terms of the jargon of the specialty doing the work, and it is often influenced by existing systems and procedures.

The conceptual schema is really the architect's view (what Zachman calls the information system "designer's view") – Row Three of the Framework. Thus, the conceptual data model is a drawing of the underlying, fundamental structures in the organization.

The logical schema corresponds roughly with the designer's view (the "builder's view" to Zachman) – Row Four of the Framework – and the physical schema corresponds to the builder's view (Zachman's "sub-contractor view") – Row Five. The logical data model, then, describes the data manipulated by a particular data manipulation technology; and the physical data model, if one were drawn, would describe the physical media that are used to store the data.

Note that any number of business owners' external views (external schemata) may be combined into a single architect's conceptual view (conceptual schema). Also, a single architect's view may be the source of more than one designer's logical views (logical schemata), depending on the database technology used.

Unfortunately, common usage today tends to have people interchanging the expressions "logical model" and "conceptual model" when, in fact, they originally meant two very different things. In addition, in the relational world, the actual table and column design is often called the "physical model," even though it is still one step removed from the physical world. What many people are calling the "physical model" should, in fact, be called the "logical model."

For purposes of these articles, the original definitions apply:

  • The conceptual model describes the fundamental nature of the business (the architect's view), without regard for how business information might be stored.
  • The logical model is a representation of that information as organized for a particular data management technology (the designer's view).
  • The physical model describes how data might be stored in a physical medium (the builder's view).

Now, let's discuss the specifics of the data models that can be created for each of our various audiences.

Row Two: Concrete Objects Seen by Business Owners

Business owners live in a very tangible world. They are concerned with the organization's products, suppliers, customers and so forth. They may also be concerned with the slightly more abstract contracts and accounts, but the characteristics of these are very highly prescribed. Any model of the "things of significance" to these people will be of very specific concrete things.

In discovering these models, the most important thing to be learned by the analyst is language. What are the terms used by people, and what do they mean? Unfortunately, it is common for the same word to mean different things to different people and for different people to use different words for the same thing. This means that more than drawing any pictures of the business owners' views, it is important first to capture this language and document inconsistencies in its use.

Indeed, for business owners, probably the most important "data model" will simply be a glossary of business terms.

Ron Ross has written two articles describing the "Fact Model" that "structures basic knowledge about business operations from a business perspective."5 This is a model of the business owner's view. Ross carefully distinguishes the fact model from what he calls a data model, which he says "focuses on delineating the data and its proper format to support system-level requirements development." In other words, to Ross, a fact model is a business owner's view, while a "data model" is a designer's view.

An external business owner's data model, then, includes:

  • Business terms as things to be described and their definitions.
  • Business terms that are attributes of the objects described by other terms and their definitions.
  • Facts, represented as relationships between terms, with names in both directions.
  • To the degree reasonable, the cardinality of relationships (how many occurrences of one thing can be related to an occurrence of something else).

Note that to the extent that you do produce a graphic model, it is critical that it look attractive and be easy to read by those not well-versed in data modeling.
It is arguable whether it is necessary to create a physical drawing of the business owner's view. In my experience, it is sometimes helpful to sketch the data model of a conversation to confirm what is being said, but to convert that to a formal model does not have as much value as converting it (and the other conversations) directly into a conceptual data model. To present a model that mimics what the person told you will prompt the person to indicate whether it is correct or not, but it won't really engage the person in the subject. A model that represents not only what was said, but also the implications of what was said will challenge the person to think about it.

Regardless of whether or not there is a graphic model, documentation of this perspective must include definitions of all terms and specification of significant business constraints. These constraints are a bit of Column Six (Motivation) of the Architecture Framework that always accompanies the data modeling effort. In addition, as many business facts as are available should be captured (at least in text) as well.

To the extent that you do choose to describe an interview graphically, recognize that this view may well include multi-way relationships (among more than two terms), as well as many-to-many relationships. It will not be in "third normal form" with all the inconsistencies and anomalies removed.

Peter Chen's original entity/relationship notation is suitable for the external model, as is object role modeling. Each can describe multi-way relationships.6

Figure 2 shows the characteristics of business owners' data models, and this table will be expanded in the next two articles of this series.


Figure 2: Comparing the Different Views

References:
1. This section is based on David C. Hay's Requirements Analysis: From Business Views to Architecture. (Prentice Hall, 2003). pp. 57-60.
2. Tsichritzis, D.a.D. and A.C. Klug. "The ANSI/X3/SPARC DBMS Framework Report of the Study Group on Dabatase Management Systems." Information Systems. 3(3). 1978.
p. 176-191.
3. Zachman, John. "A Framework for Information Architecture," IBM Systems Journal, Vol. 26, No. 3. (IBM Publication G321- 5298). See also www.essentialstrategies.com/publications/methodology/zachman.htm .
4. Zachman and Hay mean basically the same thing but use different terms to identify some of the rows. For a discussion of these differences, see David C. Hay. Requirements Analysis. Prentice Hall, 2003. pp. 5-6.
5. Ross, Ron. "What are Fact Models and Why Do You Need Them?" http://www.brcommunity.com/a2000/b008a.html and http://www.brcommunity.com/a2000/b008b.html. ( Editor's Note: You must register in order to view these articles. Registration is free.)
6. For a detailed comparison of various data modeling notations, see David C. Hay, Requirements Analysis. Prentice Hall, 2003. pp. 343- 387.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access