All this seemed very logical at the time. It was an approach intended to bring clarity and uniformity to the data resource of any enterprise. Sloppy and inconsistent definitions would be eliminated, and everyone would know exactly what to expect from a particular piece of data. Sharing and exchange of data would be facilitated and made more reliable. Also, although this remained largely unspoken, the lives of data administrators would be made much easier. After all, how would we ever cope if one data item had more than one definition? It seemed illogical and completely against the prevailing spirit pervading data management.
I should have known better.
One of my first exposures to this issue of definitions came in the area of master data management (MDM). MDM requires the central administration of certain entities used throughout the enterprise. Product and Customer are the two most commonly cited master data entities. If there was to be one centrally administered source for MDM entities, then it needed to be clear what this source contained; precise definitions were needed. It quickly became apparent to me that common definitions are not easy to arrive at. What I found was that data administration in the context of a restricted subject area, such as accounts receivable, is much easier because you are typically dealing with a group of like-minded users in one or a few related organizational units. In such situations, there is usually only one stovepipe system. Even if there are more systems, they are built across the same subject area where there is a common understanding of Product or Customer.
Trying to do this at the enterprise level is a different challenge. Many companies have spent inordinate amounts of time and money trying to achieve a standard enterprise-wide definition of Customer, for example. The results are not encouraging. Marketing will probably always want to include prospects as customers, while accounts receivable will only recognize customers as individuals or organizations that have been sent a bill for goods and services. This is not an academic problem. How do we calculate gross annual sales per customer without an appropriate definition of Customer? Marketing's definition may dilute this number, whereas accounts receivable's may overstate it.
Data administration's efforts to obtain common, standardized, enterprise-level definitions may seem logical and have initial organizational acceptance. However, once the major business actors realize what is really at stake, there can be considerable discord; and data administration, far from creating harmony, finds that it has stirred up problems that have no easy resolution. Even if a resolution could be found, data administration typically has no mechanism to enforce it.
The Path to Generalization
One way in which data administration can save face in this kind of situation is to come up with a definition that everyone can agree on. This means, in reality, that the definition is so general that nobody can disagree with it. For instance, Customer can be defined as "an individual or organization we potentially do business with." Perhaps a better example is the familiar definition of metadata as "data about data." Such overgeneralized definitions are difficult to characterize as incorrect, but it is hard to see what is excluded from them when you need to think about specific instances in real-world situations. In the meantime, large constituencies within the enterprise continue working with much more specific definitions of Customer that are incompatible. By accepting overgeneralized definitions, data administration is proving its irrelevance. Such definitions cannot be used for anything practical, and when data crosses business subject areas, the problems are eventually going to show up - usually after the expenditure of huge sums of money.
Generalized definitions are not just a problem at the entity level. They also exist for attributes. One would think that attributes are so specific that detailed definitions would be easier to arrive at. This is not the case. One of the problems stems from the fact that it is very easy to use the English language in clever ways to hide ambiguity. Another issue is that there is a conflict between precision and intuitive understandability. Legal contracts often contain very precise definitions of the terms used in them. These definitions are written in tortured English that can be quite difficult to follow. You have to be a lawyer to understand them. Data administrators, who often end up formulating data definitions, usually do not have the depth of business understanding to arrive at very precise definitions, and if they did, the number of people who would be able to truly understand them would be quite limited. The alternative is to have more generalized definitions that nobody will disagree with, but which are not particularly useful.
One way in which the inadequacies of generalized attribute definitions are revealed is when business rules approaches are implemented. Business rules, like data elements, need to be atomic, and business rules are especially useful for defining derived or calculated attributes. For instance, the attribute Current Account Balance may have an English language definition of "account balance at close of previous business day," but expressing it in business rules may reveal that the way it is calculated is quite different in the case of an individual, a corporation and a not-for-profit organization. In reality, we have Individual Current Account Balance, Corporation Current Account Balance and Not-for-Profit Current Account Balance. The atomicity becomes apparent because of the need for calculations in the business rules and can no longer be hidden in a generalized, albeit "true" definition. Initially, I was astonished when a business rules project I worked on led to the addition of large numbers of additional attributes in what were thought to be complete, signed-off data models. Now I am no longer surprised.