A great way to sharpen our analysis and modeling skills is to continuously address real-world scenarios. A modeling scenario along with suggested solutions appears each month in this Design Challenge column. The scenario is emailed to more than 1,500 modelers up to the challenge. Many of the responses, including my own, are then consolidated into this column. If you would like to become a Design Challenger and submit modeling solutions, please add your email address at www.stevehoberman.com/designchallenge.htm. If you have a challenge you would like our group to tackle, please email me a description of the scenario at me@stevehoberman.com.

The Challenge

You are responsible for modeling best practices in your company and have noticed three different ways for modeling codes. Consider Gender for example. Gender Code, which can have the values F or M (or possibly 01 or 02), has been modeled without a code lookup, with a code lookup and with a generic code lookup.

Without a code lookup, there is no description available for the code:

Figure 1: Without a Code Lookup

With a code lookup, there is a description available for each code. A Gender Code value of F (or 01) has a Gender Description of Female, and a Gender Code value of M (or 02) has a Gender Description of Male:

Figure 2: With a Code Lookup

With a generic code lookup, all of the codes are stored in a single structure. In this example, the Type Code can contain the value Gender, Actual Code contains F (or 01) or M (or 02) and Code Description contains Female or Male:

Figure 3: With a Generic Code Lookup

When should you use each of these options?

The Response

Our challengers believe strongly that having a nongeneric code lookup should be the default standard. "I strongly recommend this practice unless I have solid justification to do otherwise," says Eric Wilson, data architect. Strengths of this structure include:

  • Referential integrity (RI). Only codes that appear in the lookup table can be entered in the referencing table. You cannot, for example, enter the value U for a gender code unless U appears in the Gender lookup.
  • Completeness. The code table can be expanded to contain dates, both short and long descriptions, statuses and language translations. With the proper structure, the code table can also support description value changes over time.
  • Clarity. The code description can resolve ambiguity in the codes. Abel Mendez, business intelligence specialist, gives examples where the description was a necessity for understanding. "I once worked in a Mexican agency where the usual M, F gender coding was replaced by H, M, meaning hombre, mujer. However, that particular choice was particularly atrocious, since there was another possible mapping (hembra, macho), which actually reverses the meaning of the codes."
  • Display and documentation. Rodger Nixon, data architect, emphasizes that the code table provides a good placeholder for documentation. These descriptions can be used for online display or reporting.

A generic code table offers all of the benefits that the nongeneric code lookup provides, with the exception of RI. Although there is a relationship shown between Generic Lookup and Person, it only enforces that any Type and Actual Code combination entered for a Person must appear in the Generic Lookup table. It could be a gender value, but it does not have to be a gender. Also, in most implementations of a generic code table, there are no constraints enforced to any referencing tables, such as Person. Domain integrity can be compromised on a generic code table. For example, ensuring that the value set for Gender is limited to a one-character alphanumeric can no longer be enforced in the structure.
In addition to losing RI, generic code lookups are complex. Deb Kesse, IT data modeler analyst senior, says, "We have done both the generic domain and the specific domain concepts. The most obvious concern we hear from developers is not being able to easily see the values that apply to their specific code without knowing the type." With added complexity could come performance implications. Both Frank Palmeri, database administrator, and Norman Daoust, principal consultant, caution modelers to consider the time cost of getting data in and out of such a generic structure.

Generic code lookups should be used when flexibility is more important than RI and structure complexity. When new types of codes are required, a generic code table minimizes the effort. For example, if Gender is a new requirement, the gender codes can be added to the generic structure as new rows and avoid creating a new code table. The code reference in the Person table will still need to be added as a column, but the overall effort would be less than if a new table needed to be created.

If there are performance implications in using a code table and the code values are intuitive and stable, you should consider not using a code lookup. Ruth Coleman, information architect, says, "When there are a few well-defined and universally understood values, a code lookup may not be required." Examples are the days of the week and the months of the year, or if there are very small domain sets such as a yes/no or true/false indicator.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access