I recently worked on an executive report on data modeling issues for customer relationship management (CRM) systems. One of the main issues I discussed in that report was that data models designed for operational systems that have been in production for a number of years were probably not suited for the kinds of analytical processing associated with CRM systems today. For example, I looked at a table layout from an insurance application that had been in use for approximately eight years. That table, which was meant to manage locations, had an interesting set of attributes: the traditional "addr_line_1," "addr_line_2," city, state, and ZIP code, along with telephone number (all in a single field), fax number area code, fax number exchange, fax number line (that is, the fax number was broken into three parts), as well as "contact_name" and "attention" fields.

My objections to this table layout were numerous and included:

  • The use of the two address lines allowed for inconsistent assignment of associative address information such as floor or suite numbers.
  • The inconsistency of layout between the telephone number and the fax number.
  • The inclusion of party data (the contact name and the attention fields) within a table that is meant to track addresses.

Basically, use of that table's data in a CRM system would have required some careful analysis to properly extract the appropriate bits of information and place that data in a well-defined target model. I provided some insight into a possible design approach for the target model based on Len Silverston's excellent book, The Data Model Resource Book, with the suggestion that each entity object being modeled should reside in its own table, along with its static attributes, while any dynamic attributes or non-scalar attributes should be managed in associated tables.

For example, the table representation for a person would include a unique identifier, the person's name, birth date and sex, but not that person's telephone number because telephone numbers change. Additionally, a person may have more than one telephone number, and each telephone number may have a different purpose. Instead, we would manage telephone number in a separate table where each number would be associated with a telephone number type (such as home number, office number, mobile phone, pager and fax), a schedule (i.e., when that person is reachable at that number), a time period (when that number is active) and a purpose (such as "business contact telephone number"). The same approach would be taken for any other attribute, such as address (a person may have a home address, a business address, a temporary address, a mailing address, etc.) or electronic contact mechanism (home e-mail, work e-mail, instant messenger ID).

However, since the time that I finished said executive report, I have continued to think about this modeling approach because of a conversation I had with a childhood friend who mentioned that his sister had undergone a sex-change operation. My assumption had been that the person table would manage the static information such as name, birth date and sex. Yet, I already knew that people change their names (I even know men who assumed their wives' last names after marriage, giving them a oxymoronic "maiden name") and people change their sex –­ the only static attribute is the birth date! Do we treat the name and sex fields as dynamic attributes to be managed in their own tables just like telephone numbers?

When you look at modeling even more closely, suddenly this kind of question becomes a little more confusing. Many object entities have what we might think are static attributes, yet they turn out to be dynamic (e.g., the white pair of shoes that was dyed pink, the U.S. Postal Service breaking a single ZIP code area into two areas and assigning a new ZIP code to half the addresses, an automobile part being assigned a different part number when it is incorporated into a manufacturer's inventory). The conclusion is that even an attribute that we think is tightly bound to an object has the potential to take on dynamic characteristics.

Let's follow this thinking to the next step. For most objects, the only truly non-changing attribute is its assigned unique identifier! In this case, the modeled entity lives in a thin table, assigned its unique ID, and related by that ID to all of its attributes, each of which is living in its own table. Therefore, a person table would just have a single field – personID –­ and all person attributes would be managed in separate tables: last name, first name, birth date and sex. Because a large number of these attributes could change, each instance could be treated in the same way as I described earlier in this column for telephone numbers.

The (possibly absurd?) conclusion is that if all we are storing for each entity is a unique ID, why do we need separate tables for each entity? Instead, let's change the table to have two fields: a unique ID and an object classifier that tells us what kind of object it is. Of course, the object classifier would also be a unique ID that is mapped to a reference table with descriptions of each classifier type. In fact, in this reductionist approach, all values that exist within the system could be represented by unique identifiers that, in turn, are mapped to reference tables where the "human-readable" value actually is managed.

Is this really absurd? While trying to write a SQL query to extract data from this kind of model might require some calisthenics, the representational model may seem familiar because it is similar to a descriptive class hierarchy in an object-oriented environment. This approach is also nicely compatible with the materialization of descriptive summarizations of entities and their associated attributes in XML. The downside, however, is a potentially confusing representational environment, which might need masking via end-client visualization and interaction tools. Is this appealing to you? I am interested in hearing your opinions at loshin@knowledge-integrity.com.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access