FEB 20, 2008 7:14pm ET

Related Links

10 Sustainability Predictions for 2011
February 23, 2011
A Letter to Future Employees: Embrace Analytics
February 3, 2011
A Hunger for Risk
January 6, 2011

Web Seminars

Why Data Virtualization Can Save the Data Warehouse
September 17, 2014
Essential Guide to Using Data Virtualization for Big Data Analytics
September 24, 2014

This is a Title

Print
Reprints
Email

Definitions of items of data really matter, but there are a couple of ways to see the value in them. One is when we focus on an individual definition within the context of a project. From such a siloed perspective, there can be a temptation to simply accept whatever is “good enough,” especially if the project team has a good idea of what the data means and is not held up due to a lack of data definitions. The more important perspective is, of course, the enterprise level, where we will usually encounter a sea of data items requiring definitions. To the extent that whatever definitions exist have been harvested from projects, many of them are inadequate, and some downright misleading. If such definitions are relied on, they have a high probability for causing problems. Even if 85 percent of the data definitions across an enterprise could be made perfect, I for one am prepared to state that the remaining 15 percent could be responsible for serious damage. I am also prepared to submit that data modelers typically prefer to deal with boxes, lines and pigeonholing things more than they care to focus on forming definitions. Good modeling often seems to apply to the visual rather than textual components of data models.

Generalization

One of the most difficult things with definitions is to make them precise. They cannot be so precise that they do not include instances that they obviously should include, but the chief sin is to make them too vague. When a definition is too vague, it becomes impossible to accurately identify what instances are truly covered - you cannot decide what is “in” or “out.” However, general definitions are much easier to construct than precise ones. Generalization is achieved by identifying the characteristics that are shared by the collection of instances under review and throwing away all the characteristics that are not shared. Like a lot of things, a generalized definition is difficult to argue over - it will be correct. Being correct is not the point, however. Definitions have to be fit for their purpose.

Abstraction

If generalization is one pathway to vague, albeit correct but useless definitions, then abstraction can be another. Abstraction in the strict sense used by the metaphysicians means extracting a set of concepts from something so you can manipulate these concepts instead of the multitude of underlying real-word things that implement these concepts (sometimes imperfectly). Unfortunately, the term “abstraction” often seems to be confused with generalization in data modeling. A good example of abstraction is taking the Customer, Vendor, Employee and so on and putting them in a Party entity in a logical data model. The definition of Party then becomes something like “An individual or legal entity of interest to the enterprise.”

Abstraction is often employed as a device by data modelers for data modelers. The claim is made that these abstractions make it less probable that the database structure will need to change over time. This is technically true. However, the complexity of the business domain does not go away. It is simply pushed to the physical layer where, in the example of Party, somebody must define codes for each Party Role that a Party can play. In practice, this is rarely done, and the definitions are just carried in people’s heads. Thus, abstraction tends to lead to a net loss of business-level definitions not only from a data model but from the entire complex of application, database and support structures.

Going Meta

A third issue in definitions is going meta. This means defining things with reference to the components and concepts that only exist at the level of logical data modeling or other layers of abstraction even further removed from the business (i.e., physical data models, actual databases and data transport mechanisms). Hence the title of this column, “This is a Title.” Yes it is, but it is referring to a component of a column, not to the subject of the column. The degree to which this is done in data models can vary, but it is sometimes overwhelming. At a low level it is simply irritating, such as definitions that begin, “This entity contains...” The entity represents a business-level concept, and that is what is of interest, not a component of a data model. Worse yet are total departures from the business. For instance, association entities all too often have definitions such as “Entity AB resolves the many-to-many relationship between Entity A and Entity B.” True, easy and totally meta, such definitions are worse than useless. They imply that a data model has been built by data modelers for data modelers. No definition in a data model should ever contain a data modeling term, except in the very rare instance that it is modeling data model metadata.

Yet the temptation to go meta is very strong because data modelers are aligned to the tools and methodologies of their trade. A useful question for logical data modelers to keep in mind is what value they are adding in building a logical data model that exceeds what could be gained from a conceptual data model plus a glossary of business terms. The answer must point to something that is of value, not just within the scope of the project the modeler is currently working on, but which will facilitate reuse of the data outside of the scope of that project.

The potential future reuse of data is a key reason for having sound definitions. Equally important and often overlapping is the need to provide understanding of the data to individuals who are not familiar with it. Definitions must be aligned with these requirements rather than with the concepts of data modeling.

Get access to this article and thousands more...

All Information Management articles are archived after 7 days. REGISTER NOW for unlimited access to all recently archived articles, as well as thousands of searchable stories. Registered Members also gain access to:

  • Full access to information-management.com including all searchable archived content
  • Exclusive E-Newsletters delivering the latest headlines to your inbox
  • Access to White Papers, Web Seminars, and Blog Discussions
  • Discounts to upcoming conferences & events
  • Uninterrupted access to all sponsored content, and MORE!

Already Registered?

Filed under:

Advertisement

Comments (0)

Be the first to comment on this post using the section below.

Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.
Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.