The Challenge
You are responsible for modeling best practices in your company and have noticed three different ways for modeling codes. Consider Gender for example. Gender Code, which can have the values F or M (or possibly 01 or 02), has been modeled without a code lookup, with a code lookup and with a generic code lookup.
Without a code lookup, there is no description available for the code:

Figure 1: Without a Code Lookup
With a code lookup, there is a description available for each code. A Gender Code value of F (or 01) has a Gender Description of Female, and a Gender Code value of M (or 02) has a Gender Description of Male:

Figure 2: With a Code Lookup
With a generic code lookup, all of the codes are stored in a single structure. In this example, the Type Code can contain the value Gender, Actual Code contains F (or 01) or M (or 02) and Code Description contains Female or Male:

Figure 3: With a Generic Code Lookup
When should you use each of these options?
The Response
Our challengers believe strongly that having a nongeneric code lookup should be the default standard. "I strongly recommend this practice unless I have solid justification to do otherwise," says Eric Wilson, data architect. Strengths of this structure include:
- Referential integrity (RI). Only codes that appear in the lookup table can be entered in the referencing table. You cannot, for example, enter the value U for a gender code unless U appears in the Gender lookup.
- Completeness. The code table can be expanded to contain dates, both short and long descriptions, statuses and language translations. With the proper structure, the code table can also support description value changes over time.
- Clarity. The code description can resolve ambiguity in the codes. Abel Mendez, business intelligence specialist, gives examples where the description was a necessity for understanding. "I once worked in a Mexican agency where the usual M, F gender coding was replaced by H, M, meaning hombre, mujer. However, that particular choice was particularly atrocious, since there was another possible mapping (hembra, macho), which actually reverses the meaning of the codes."
- Display and documentation. Rodger Nixon, data architect, emphasizes that the code table provides a good placeholder for documentation. These descriptions can be used for online display or reporting.
A generic code table offers all of the benefits that the nongeneric code lookup provides, with the exception of RI. Although there is a relationship shown between Generic Lookup and Person, it only enforces that any Type and Actual Code combination entered for a Person must appear in the Generic Lookup table. It could be a gender value, but it does not have to be a gender. Also, in most implementations of a generic code table, there are no constraints enforced to any referencing tables, such as Person. Domain integrity can be compromised on a generic code table. For example, ensuring that the value set for Gender is limited to a one-character alphanumeric can no longer be enforced in the structure.
In addition to losing RI, generic code lookups are complex. Deb Kesse, IT data modeler analyst senior, says, "We have done both the generic domain and the specific domain concepts. The most obvious concern we hear from developers is not being able to easily see the values that apply to their specific code without knowing the type." With added complexity could come performance implications. Both Frank Palmeri, database administrator, and Norman Daoust, principal consultant, caution modelers to consider the time cost of getting data in and out of such a generic structure.
Generic code lookups should be used when flexibility is more important than RI and structure complexity. When new types of codes are required, a generic code table minimizes the effort. For example, if Gender is a new requirement, the gender codes can be added to the generic structure as new rows and avoid creating a new code table. The code reference in the Person table will still need to be added as a column, but the overall effort would be less than if a new table needed to be created.
If there are performance implications in using a code table and the code values are intuitive and stable, you should consider not using a code lookup. Ruth Coleman, information architect, says, "When there are a few well-defined and universally understood values, a code lookup may not be required." Examples are the days of the week and the months of the year, or if there are very small domain sets such as a yes/no or true/false indicator.
Steve Hoberman is one of the world's most well-known data modeling gurus. He taught his first data modeling class in 1992 and has educated more than 10,000 people about data modeling and business intelligence techniques since then. Steve is known for his entertaining, interactive teaching and lecture style (watch out for flying candy!), and organizations around the globe have brought Steve in to teach his Data Modeling Master Class, which is recognized as the most comprehensive data modeling course in the industry. Steve is the author of "Data Modeling Made Simple," "Data Modelers Workbench" and "Data Modeling for the Business (Technics Publications). He is the founder of the Design Challenges group and inventor of the Data Model Scorecard.










Be the first to comment on this post using the section below.