We frequently speak of definitions in data management,but THEyare often taken for granted. In particular, it seems that everyone knows what a definition is and that everyone assumes producing definitions is easy. Nobody stops to ask what exactly a definition is and if there are any particular considerations about formulating definitions. Perhaps this is because the educational system in the U.S. focuses on learning definitions by rote for the SATs. More likely it is because we are heavily conditioned by our exposure to definitions as they appear in dictionaries, which may be why we find data dictionaries in many enterprises. It also seems to be why definitions in data models resemble dictionary definitions - one-sentence tautologies composed of synonyms. Indeed, the dictionary model of definition rules supreme in data management.
Surprisingly, two major kinds of definitions have been recognized for the past two-and-a-half millennia, and a long and slow war has been fought between them. A real definition fully explains the nature of a concept. It goes beyond providing awareness that something exists, to tell us what it is. A nominal definition explains the meaning of a word or term. For example, the word "thunder" could be defined as "a noise in the clouds." This gives enough information to know what the word "thunder" is referring to, but it does not tell us much about what thunder really is.
What's the Difference?
Words and terms are symbols that represent something else. A word or term has a meaning that tells us what they represent, but they usually do not tell us anything about the nature of what is represented. A concept is an understanding of a type of thing - and all individual things fall into one or more types. When we produce data models, we declare concepts as entity types in our models. Thus concepts, rather than words, are important to data modelers.
We can put this into a small metamodel as shown in the figure on page 32.
A nominal definition is, therefore, an association entity type between a term and a concept. There is not a one-to-one correspondence between a term and a concept because there are many more concepts than there are words or phrases, which forces us to use homonyms.
However, a concept has a real definition as an attribute. Every concept must have a real definition - otherwise it would not be a concept.
When definitions were first conceived of by the ancient Greeks, they were primarily concerned with real definitions. In modern times, many philosophers have denied that real definitions exist at all and will only accept nominal ones. For instance, in the fourth century B.C., Aristotle considered a definition to be "the account of the essence of the thing," whereas in the middle of the 20th century Ludwig Wittgenstein wrote, "Definitions are rules for translating from one language into another. Any correct sign-language must be translatable into any other in accordance with such rules: it is this that they all have in common."
If Wittgenstein is right, then we waste a lot of time defining concepts in data models. We do not define them in order to translate them into other languages (and please let us not pretend data models are some kind of language). However, his view seems to be the one that prevails today. Consider the now popular area of semantics. We are told that if computers could understand the meaning of data, then they could do all kinds of wonderful things - and this is the vision of the semantic Web. Unfortunately, it seems that "meaning of data" always boils down to "the identification of the concepts that the words involved signify." Now, this is extremely important, and natural language processing is an area where there is significant impact. But is it not the "meaning of data."
Data is Special
Data is a representation of something else. We can manipulate data to produce new data that then gets mapped back to the real world and has an impact on it. For example, my bank calculates how much I owe at the end of the month on the overdraft I did not realize I had. The data in my bank's applications has to model the structure and behavior of an overdraft. If this is not done correctly, real errors will be made - like the bank paying me interest on my overdraft. We have a concept of an overdraft and a real definition of what it is. This is more useful than WordNet's definition of the word overdraft: "a draft in excess of the credit balance."
Data is also special because it has a dual nature. It not only represents concepts, but it is a thing in its own right. Because of this, it has constraints on how it is able to represent a concept. Perhaps a field can only accommodate 80 characters, forcing users to abbreviate longer texts when they enter the data. This is just the tip of the iceberg, and many other considerations need to be understood. The point is that a good deal about how data represents a concept also has to be put into a real definition.
One final point. Dictionaries are forced to produce highly abbreviated definitions to reduce printing costs. Somehow, this style found in dictionaries has been accepted as the way write definitions. Yet, brevity inevitably means loss of detail. Once again we see that nominal definitions are of far less service to data management than real definitions are.