Continue in 2 seconds

The Language of Data Modeling

  • Richard Fisher, Bob Schmidt
  • April 01 1998, 1:00am EST
More in

(Part 2 of a Continuing Series) An English text can be translated into Greek in such a way that the Greek reader and the English reader can both understand the author of the text in the same way. We can infer from this fact that there is a meaning that can be attained apart from the language used to communicate it.

School children can tell you that a sentence can be broken down into parts such as subjects and verbs. Do we know as much about the structure of meaning? One argument has it that the structure of meaning parallels the structure of language, but which language? Also, is the structure of language as we suppose it to be?

Languages are tools whereby people express their meanings. As with any tool, some are more well-suited to a particular task than others. (And some practitioners are better than others.) By way of analogy, a crescent wrench is a good all-around wrench, but a monkey wrench is better for pipes. Natural languages such as Greek and English are suited for a broad range of expression from technical exposition to artistic expression. Even as many natural languages are being extinguished, new designer languages are being invented. As with the monkey wrench, these designer languages are well-suited to a particular task.

I am speaking of the languages such as those that accountants and engineers use. These professions have a jargon, syntactical rules and symbols that constitute a language--one that simplifies exchanging their meanings with their colleagues. Data modeling is a language designed for expressing ideas about the things for which people want to record data.

Like all languages, the language of data modeling needs to conform to the inherent structures of meaning. Language can be invented when there is not an awareness of the underlying structure of meaning--but how much more easily is language invented when one does understand the fundamentals! Among those who call themselves data analysts, there is surprising consensus as to what that structure might be. That structure is called the meta-model. Author Rick Fisher will take a thousand words or so to provide an overview of the meta-model.

The power tools of application development have turned ordinary programmers into Power Rangers. They can blast through GUIs and uncover state-of-the-art interfaces with super speed. The only thing the super powers can't penetrate is getting requirements for major applications. Diverse groups of users with disparate needs, conflicting terminology and complex processes entangle developers. They need help.

Data modeling can help. It gives developers a technique for engaging business people in a constructive dialogue to reveal correct business requirements.

The foundation of data modeling is the Conceptual Data Model (CDM), a representation of the business objects of interest along with the facts that must be maintained about those objects and the business-based relationships between those objects.

The CDM is what data modeling is all about. The CDM describes what a business is interested in. If the CDM is wrong, it doesn't matter which notation is the best or which terrific tool gets budget approval. Fail to describe the actual business, and clients will be unhappy with the data warehouse, data mart or application delivered.

Data modeling means analyzing the essence of a business. A car rental company, for example, would have entity types such as customer, vehicle, station, agent--the common terms of the business. Each entity type, such as vehicle, would have its own set of descriptors, such as vehicle identification number, make, model and color. Each vehicle would be shown as "owned" by a station independent of the station at which it is "located." These statements constitute a data model expressed in natural language.

These facts and rules of the business also embody key requirements any information system and database must support. This dual nature is at the heart of the value of data modeling to systems developers. The CDM "talks business" while at the same time drawing out requirements for systems. (This is true of all CDMs regardless of variations in notation style.)

The CDM consists of the following parts: entity types; the relationships among those entity types; attributes, the descriptors of those entity types; subtypes, specializations of those entity types; and supertypes, a generalization of those entity types.

FIGURE 1: Vehicle

Let's formally define these elements of the CDM. Since the viewpoint of the CDM is business, the definitions of modeling elements are business-oriented (non-technical).

An entity type is a kind of thing about which the business keeps information. Such things might be tangible objects or people. They may be concepts, events, agreements or states of being. Each entity type is named with a singular noun that fully distinguishes it from all other entity types. Most often a business already has a name for these things, sometimes two or three. Notice that the definition says "a kind of thing." That's because an entity type is an abstraction of the data kept about groups of things, each of which is an occurrence. "Vehicle" is an abstraction that might include the 1997 gold sport coupe (an occurrence) I rented from the Raleigh Station last week. Each entity type represents the entire group of all such occurrences in the business.

All data modeling notations have the concept of an entity type. The names given to this concept vary with the author or tool. For example, some authors shorten the term to "entity," while others prefer the term "class." An entity occurrence may be shortened to entity (causing confusion) and may also be known as an entity instance or as an object.

Some practitioners distinguish between different entity types based upon their ability to exist independently. A completely independent entity type is called fundamental. An entity type that requires the existence of one other entity type is called attributive (or weak). For example, a station telephone (there may be many) cannot exist without the station, but a station does not require any other entity types to exist. The station would be called fundamental, and the station telephone called attributive.

FIGURE 2: Occurrence and Entity Type

A relationship tells us how one entity type is associated with another entity type. Relationships are named as independent clauses describing the linkage: "Station is origin of rental, vehicle is rented by rental, customer makes rental." Relationships may be embellished with additional notations to denote business rule constraints. A business may specify that a relationship is mandatory (e.g., each customer must have a rental) or optional (e.g., each vehicle may have a rental or not). A business may limit a relationship to one occurrence of an entity type, or it may allow many. In our example, each rental rents one vehicle from one station to one customer. However, each customer, station and vehicle may have many rentals.

Optionality (is it required?) and cardinality (one or many possibilities?) are important to model because they reflect the business rules about how entity types relate to one another. Whether a rental can cover more than one vehicle is a key business rule (although it may not be plainly documented).

FIGURE 3: Relationship Notations

Dr. Peter Chen in his groundbreaking work in data modeling noted that relationships, just like entity types, can themselves be described by attributes. For example, the relationship between a customer, a station and a vehicle might be described by the date, time and duration of the rental. Another approach is to create a new entity type--rental agreement. Some authors would refer to such an entity type as an associative entity.

FIGURE 4: Relationships with Attributes

Attributes are the data that describe entity types. Each attribute has a name that describes the information. Each attribute can describe only one entity type; however, similar attributes may describe multiple entity types. For example, station and customer each may have a name attribute, but they are separate and distinct. The attributes define the types of data that is kept about entity occurrences. If the business needs to keep the information, an attribute must exist to store it. Business rules specify if a value is required for each occurrence and the domain of permitted values.

The handling of attributes is another area in which data modeling authors diverge. Most notations model attributes within the entity type to which they belong. In such notations, an attribute cannot exist on its own. There are some notations in which attribute is a modeling element connected to the entity type by a relationship (typically "describes" or "identifies"). The ERA model is composed of entity, relationship, attribute, showing all three graphically. Object Role Modeling (ORM), defined by Dr. Terry Halpin, also models attributes separately.

Where attributes are modeled separately, relationships between attributes are possible, allowing more of the business rules to be incorporated into the model. Since the typical model has many more attributes than entity types, these models tend to be quite large. Modeling attributes separately also moves the focus of the model from the things about which information is kept to the information itself. Some consider this an advantage; others disagree.

FIGURE 5: ORM Example

Incidentally, a few authors use the term "predicate" to mean attribute or relationship. They see that both attributes and relationships describe entity types. This simplifies data modeling into entity types and predicates.

FIGURE 6: Subtype Notations

The CDM also captures the business complexity represented through subtypes. Every business categorizes things in various ways. Some of these are serious enough to warrant being modeled because they affect what data must be kept. Determining which business distinctions require subtypes is a challenging task. An entity type and its subtypes can be modeled as a divided entity type or as a set of related entity types. The treatment varies between notation styles. Keep in mind that many models become overrun with subtypes unnecessarily.

Another perspective on subtypes can be found in the Sybase Data Modeling with PowerDesigner DataArchitect. From this perspective, a supertype is the set of all occurrences that have the same attributes and relationships. For example, a rental agreement and a leasing agreement are both contracts. Subtypes hold attributes that are distinct (e.g., lease buy-out amount).

The above terms (entity type, relationship, attribute, and so forth) and the ways they interact are called the meta-model (literally, a model of a model). The meta-model for the CDM is not exceptionally complex. The real complexity comes in developing a model that correctly reflects the business of the enterprise. Entering into a dialogue with business people about the business is the basis of requirements definition. Structuring that dialogue to produce a correct, clear and meaningful data model takes a deep understanding of all the modeling elements combined with communication skills.

You can learn more about data modeling from reading articles and books or by attending seminars. Then you can increase your power, Power Ranger.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access