Code Tables, Indicators and Nulls
One of the areas that has troubled me about data modeling involves the use of code tables. These pesky little tables, consisting of just a few columns and a few rows, are incredibly persistent in all database designs. They generally comprise anywhere from 20 to 50 percent of the tables in any data model. Their names are quite familiar to us: County Code, Product Category, Customer Type and so on. What is strange about these tables is that they usually have to be populated before a system or database goes live. Could they possibly have something to do with design?
Similarly, indicators - otherwise known as flags or switches - are pervasive types of attributes in data models. We argue over whether they should be physically implemented as "true" or "false" versus "yes" or "no," or some other representational scheme, but we rarely stop to think what they are doing in a database in the first place. And yet, like code tables, they are quite common in database designs, even if it seems a little odd that a characteristic of an entity should have a value like yes or no. Again, perhaps we are looking at something that partly embodies design rather than purely representing data.
Attributes that are indicators, or which are foreign keys from code tables, are often found in entities recognized as master data. Such entities include perennial favorites like Customer and Product, but in more specialized databases they are still found in entities that are readily classified as master data. This is not a universal rule, and both indicator and code attributes can be found in transactional data. Nevertheless, these two kinds of attributes are found most frequently in master data, and it is instructive to ask why this is. The answer seems to be, as we shall see below, that they are used to describe structures within database designs that cannot be represented by current data modeling techniques.
A separate issue, which at first appears quite unrelated, involves specifying if columns are null or not null in data models. Anybody with any experience of data modeling is likely to have come across attributes that must be null in certain circumstances, but must be not null in other circumstances. Data modeling tools do not seem to be built to handle this requirement: they ask if an attribute is potentially nullable or always not null. They cannot record if a column must be null under certain conditions but must be not null under other conditions. Once again, this is a hint that there are limits to how precisely data modeling can specify database design.
Logical versus Physical
Before we look at an example of the limits of data modeling, it is important to recognize a cultural issue in data modeling, and indeed IT as a whole. This is the great divide between the logical and physical perspectives of data management. On the one hand we have data modelers and data administrators who deal in design and related metadata, and are perceived as the builders, if not the owners, of data models. On the other hand we have database administrators and programmers who have to get databases up and running, and who look to those who traffic in more logical constructs for as much practical help as possible. Nevertheless, in my experience data modelers and data administrators nearly always see themselves as responsible solely for designs, and think that the way in which business users choose to put data into a database is entirely up to the business users. This protective attitude simplifies the life of a great number of IT professionals, but it may set limits on the level to which databases can be designed.
One problem is that if physical values that populate database columns somehow describe the structure of a database, then this structure is never going to be captured in a data model since a model never contains physical values - only the implemented database does. The logical/physical divide between models and databases will always stand in the way of such design structure being represented.
The World of Hidden Subtypes
Returning to the issues presented by code tables, indicators, and nulls, it is possible to see that database design consists of more than can be represented in a data model.
Suppose we have a manufacturing company that makes kitchen appliances and plumbing fixtures. The Product entity for this company can be represented by the model fragment shown in Figure 1.









