Definitions of items of data really matter, but there are a couple of ways to see the value in them. One is when we focus on an individual definition within the context of a project. From such a siloed perspective, there can be a temptation to simply accept whatever is “good enough,” especially if the project team has a good idea of what the data means and is not held up due to a lack of data definitions. The more important perspective is, of course, the enterprise level, where we will usually encounter a sea of data items requiring definitions. To the extent that whatever definitions exist have been harvested from projects, many of them are inadequate, and some downright misleading. If such definitions are relied on, they have a high probability for causing problems. Even if 85 percent of the data definitions across an enterprise could be made perfect, I for one am prepared to state that the remaining 15 percent could be responsible for serious damage. I am also prepared to submit that data modelers typically prefer to deal with boxes, lines and pigeonholing things more than they care to focus on forming definitions. Good modeling often seems to apply to the visual rather than textual components of data models.

Generalization

One of the most difficult things with definitions is to make them precise. They cannot be so precise that they do not include instances that they obviously should include, but the chief sin is to make them too vague. When a definition is too vague, it becomes impossible to accurately identify what instances are truly covered - you cannot decide what is “in” or “out.” However, general definitions are much easier to construct than precise ones. Generalization is achieved by identifying the characteristics that are shared by the collection of instances under review and throwing away all the characteristics that are not shared. Like a lot of things, a generalized definition is difficult to argue over - it will be correct. Being correct is not the point, however. Definitions have to be fit for their purpose.

Abstraction

If generalization is one pathway to vague, albeit correct but useless definitions, then abstraction can be another. Abstraction in the strict sense used by the metaphysicians means extracting a set of concepts from something so you can manipulate these concepts instead of the multitude of underlying real-word things that implement these concepts (sometimes imperfectly). Unfortunately, the term “abstraction” often seems to be confused with generalization in data modeling. A good example of abstraction is taking the Customer, Vendor, Employee and so on and putting them in a Party entity in a logical data model. The definition of Party then becomes something like “An individual or legal entity of interest to the enterprise.”

Abstraction is often employed as a device by data modelers for data modelers. The claim is made that these abstractions make it less probable that the database structure will need to change over time. This is technically true. However, the complexity of the business domain does not go away. It is simply pushed to the physical layer where, in the example of Party, somebody must define codes for each Party Role that a Party can play. In practice, this is rarely done, and the definitions are just carried in people’s heads. Thus, abstraction tends to lead to a net loss of business-level definitions not only from a data model but from the entire complex of application, database and support structures.

Going Meta

A third issue in definitions is going meta. This means defining things with reference to the components and concepts that only exist at the level of logical data modeling or other layers of abstraction even further removed from the business (i.e., physical data models, actual databases and data transport mechanisms). Hence the title of this column, “This is a Title.” Yes it is, but it is referring to a component of a column, not to the subject of the column. The degree to which this is done in data models can vary, but it is sometimes overwhelming. At a low level it is simply irritating, such as definitions that begin, “This entity contains...” The entity represents a business-level concept, and that is what is of interest, not a component of a data model. Worse yet are total departures from the business. For instance, association entities all too often have definitions such as “Entity AB resolves the many-to-many relationship between Entity A and Entity B.” True, easy and totally meta, such definitions are worse than useless. They imply that a data model has been built by data modelers for data modelers. No definition in a data model should ever contain a data modeling term, except in the very rare instance that it is modeling data model metadata.

Yet the temptation to go meta is very strong because data modelers are aligned to the tools and methodologies of their trade. A useful question for logical data modelers to keep in mind is what value they are adding in building a logical data model that exceeds what could be gained from a conceptual data model plus a glossary of business terms. The answer must point to something that is of value, not just within the scope of the project the modeler is currently working on, but which will facilitate reuse of the data outside of the scope of that project.

The potential future reuse of data is a key reason for having sound definitions. Equally important and often overlapping is the need to provide understanding of the data to individuals who are not familiar with it. Definitions must be aligned with these requirements rather than with the concepts of data modeling.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access