When I started my career in data administration, it was explained that one of my tasks would be to obtain definitions of entities and attributes from business users. These definitions, I was told, had to be sufficiently specific and complete so that anyone in our enterprise could understand them, no matter where they appeared in reports, screens or specifications. In addition, I was reminded that the law of atomicity meant that there could only be one definition for each entity and attribute.
All this seemed very logical at the time. It was an approach intended to bring clarity and uniformity to the data resource of any enterprise. Sloppy and inconsistent definitions would be eliminated, and everyone would know exactly what to expect from a particular piece of data. Sharing and exchange of data would be facilitated and made more reliable. Also, although this remained largely unspoken, the lives of data administrators would be made much easier. After all, how would we ever cope if one data item had more than one definition? It seemed illogical and completely against the prevailing spirit pervading data management.
I should have known better.
One of my first exposures to this issue of definitions came in the area of master data management (MDM). MDM requires the central administration of certain entities used throughout the enterprise. Product and Customer are the two most commonly cited master data entities. If there was to be one centrally administered source for MDM entities, then it needed to be clear what this source contained; precise definitions were needed. It quickly became apparent to me that common definitions are not easy to arrive at. What I found was that data administration in the context of a restricted subject area, such as accounts receivable, is much easier because you are typically dealing with a group of like-minded users in one or a few related organizational units. In such situations, there is usually only one stovepipe system. Even if there are more systems, they are built across the same subject area where there is a common understanding of Product or Customer.
Trying to do this at the enterprise level is a different challenge. Many companies have spent inordinate amounts of time and money trying to achieve a standard enterprise-wide definition of Customer, for example. The results are not encouraging. Marketing will probably always want to include prospects as customers, while accounts receivable will only recognize customers as individuals or organizations that have been sent a bill for goods and services. This is not an academic problem. How do we calculate gross annual sales per customer without an appropriate definition of Customer? Marketing's definition may dilute this number, whereas accounts receivable's may overstate it.
Data administration's efforts to obtain common, standardized, enterprise-level definitions may seem logical and have initial organizational acceptance. However, once the major business actors realize what is really at stake, there can be considerable discord; and data administration, far from creating harmony, finds that it has stirred up problems that have no easy resolution. Even if a resolution could be found, data administration typically has no mechanism to enforce it.
The Path to Generalization
One way in which data administration can save face in this kind of situation is to come up with a definition that everyone can agree on. This means, in reality, that the definition is so general that nobody can disagree with it. For instance, Customer can be defined as "an individual or organization we potentially do business with." Perhaps a better example is the familiar definition of metadata as "data about data." Such overgeneralized definitions are difficult to characterize as incorrect, but it is hard to see what is excluded from them when you need to think about specific instances in real-world situations. In the meantime, large constituencies within the enterprise continue working with much more specific definitions of Customer that are incompatible. By accepting overgeneralized definitions, data administration is proving its irrelevance. Such definitions cannot be used for anything practical, and when data crosses business subject areas, the problems are eventually going to show up - usually after the expenditure of huge sums of money.
Generalized definitions are not just a problem at the entity level. They also exist for attributes. One would think that attributes are so specific that detailed definitions would be easier to arrive at. This is not the case. One of the problems stems from the fact that it is very easy to use the English language in clever ways to hide ambiguity. Another issue is that there is a conflict between precision and intuitive understandability. Legal contracts often contain very precise definitions of the terms used in them. These definitions are written in tortured English that can be quite difficult to follow. You have to be a lawyer to understand them. Data administrators, who often end up formulating data definitions, usually do not have the depth of business understanding to arrive at very precise definitions, and if they did, the number of people who would be able to truly understand them would be quite limited. The alternative is to have more generalized definitions that nobody will disagree with, but which are not particularly useful.
One way in which the inadequacies of generalized attribute definitions are revealed is when business rules approaches are implemented. Business rules, like data elements, need to be atomic, and business rules are especially useful for defining derived or calculated attributes. For instance, the attribute Current Account Balance may have an English language definition of "account balance at close of previous business day," but expressing it in business rules may reveal that the way it is calculated is quite different in the case of an individual, a corporation and a not-for-profit organization. In reality, we have Individual Current Account Balance, Corporation Current Account Balance and Not-for-Profit Current Account Balance. The atomicity becomes apparent because of the need for calculations in the business rules and can no longer be hidden in a generalized, albeit "true" definition. Initially, I was astonished when a business rules project I worked on led to the addition of large numbers of additional attributes in what were thought to be complete, signed-off data models. Now I am no longer surprised.
Generalized definitions hide the fact that different business subject areas have different definitions for the same attributes. As noted previously, this is not a problem until data crosses the boundaries of the subject areas that contain these unique definitions. Unfortunately, technology has now enabled enterprises to do just this. It happens in data warehouses and in architectures where transactions are exchanged between operational systems. Eventually, the problems show up in all kinds of unexpected ways. Some of these may have major significance, such as the misstatement of financial data. Others may be more insidious but just as consequential, such as making incorrect strategic business decisions. When this is recognized, so is the need to get the definitions right. There are two general approaches to resolving the issue.
The first approach is to insist on enterprise-wide common definitions for the entities and attributes. This is again likely to encounter the obstacle that different business subject areas have reasons why they define Customer or Gross Annual Revenue in the way that they do.
The second approach is to attempt to change the names of entities and attributes in business subject areas to make them more specific and to distinguish them from synonyms in other business subject areas. However, this too is problematic. Entity and attribute names will be used widely in screens, reports, written documentation, verbal communications and the whole subculture of a business subject area. It is quite unrealistic to expect that all this will change.
If the enterprise insists on pursuing either approach, it is quite likely that the response will be to pass one version of an entity or attribute to a particular destination data warehouse or transaction system, while maintaining another within the source business subject area. This "give them what they want" attitude may lead to even more confusion in the long run as data items are maintained exclusively for poorly understood downstream uses. The original problem is not really resolved. Furthermore, can data administration really claim that this is a successful resolution of the original state of affairs, as the central repository becomes populated with an increasing number of such "artificial" data items?
The quest for single, standard definitions of entities and attributes across an enterprise may be an illusion. This is not to say that it cannot be attained for many data items, but in a moderately sized enterprise, there will almost certainly be a set of entities and attributes for which there is no single version of the truth.
A different way of expressing this is that the enterprise itself does not represent a single context within which we should expect to find homogenous definitions of data items. Rather, it typically represents a set of contexts. Only within each of these contexts can we hope to find definitions of data items that are specific enough to have practical applicability.
Thinking about the enterprise as a single context for data definitions is not simply a conceptual or methodological problem. Many data modeling tools and repository solutions are built on the assumption that one data item has one definition. Synonyms may be captured, but the idea that a definition exists within a context is not implemented. No functionality exists for the tracking and administration of contexts.
This lack of attention to context is not entirely universal. ISO/IEC 11179, a standard for metadata registries, does tackle the issue of context, although its definition of context is extremely wide (see sidebar).
"When I use a word," Humpty Dumpty said, in a rather scornful tone, "it means just what I choose it to mean, neither more nor less."
"The question is," said Alice, "whether you can make words mean so many different things."
This quotation comes from Alice in Wonderland, and on some days it can seem too close for comfort to the reality of data administration. Data definitions are not word games. They are necessary for the effective utilization of information in enterprises. Context needs to be included as part of the conceptual framework within which data is administered. It is time to move away from simplistic notions of a single version of the truth that provide a false sense of accomplishment and will never be adequate to the needs of an enterprise.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access