Reference data is one classification of information that must be zero defect. The term "reference data" is actually a poor term relative to the importance of the information represented. First, the term represents a data-centric view rather than a business-centric view. Reference data is data in one file referenced by records in other files. The business significance is that reference data makes important classifications of objects and events of interest to the enterprise.

Classifications such as gender make a difference in what medical diagnoses and procedures are valid for a health provider and whether reimbursement is appropriate by an insurance company. Order type classifications in retail and wholesale make a difference as to what pricing rules apply. Get the classification wrong, and you invoice for the wrong price. Country code classifications dictate which address format is appropriate for international mailing. Millions of pieces of mail go astray because of incorrect address formats.

Quality Problems

Several quality problems confront classification information:

  • New classification options become required, such as a drop-ship order type, but the code is not added to the order-type table. Drop-ship order records are not able to be classified as such.
  • Obsolete classification code values can be inadvertently selected for a record. For example, frequent shopper classifications may change from four classifications (diamond, platinum, gold and silver) to three (dropping the silver). Two potential quality problems:
    1) If referential integrity (the requirement that a frequent shopper customer must have a valid classification code) is not enforced, customer records may be left with a classification of silver frequent shopper, but no program associated with them. Or,
    2) If frequent shopper classification is selectable by information producers, customers may be classified into the no-longer-valid silver classification.
  • If there is no business subject matter steward in charge of controlling updates to classification values, inappropriate or conflicting code values may be created.
  • Classification code values that are created outside of your enterprise, such as ISO country codes, may not be available quickly enough for your need. If not, you need to use an existing, incorrect code or create a temporary code value with the following problems:
    1) If you use an existing value for a different classification, such as the country classification code for the former Yugoslavia (YUG) for addresses in the emerging Slovenia and Croatia, you have an incorrect address. When the codes for Slovenia (SVN) and Croatia (HRV) are finally assigned, you will need to sort out which addresses belong with which new country.
    2) When the new assigned code becomes available, you will need to update all records containing the temporary code. If you don't, processes using the invalid codes will fail.

Quality Management Techniques

Outline a controlled process for defining information and creating valid codes. This should include:

  • Assign a business information steward and an information resource management specialist to validate business definition and information design principle integrity.
  • Create classification attributes that represent a single kind of categorization of objects or events. For example, the attribute Product Line Code should not classify both product line and sub-line as one code.
  • Clearly define the meaning of the classification.
  • Clearly define the meaning of each code type as a single, nonoverlapping classification. For example, Frequent Flyer classifications should not have mileage ranges that overlap.
      Silver = from 25,000 to 49,999 miles
      Gold = from 50,000 to 74,999 miles
      Platinum = from 70,000 to 99,999 miles
      Diamond = 100,000 miles and over
      (Note that the mileages for gold and platinum overlap.)
  • Establish effective end dates for the classification definition and business rules for each classification definition. These can change and are required to analyze any apparent anomalies when comparing data across time.

Business information stewards and information producers creating classification codes must understand how they are used across the enterprise and must assure accuracy and completeness in the set of valid values. They must keep them current when changes occur.
Train the information producers who create records with classifications, so they know how to properly classify objects or events.

Design edit and validation tests to prevent inadvertent errors or to automatically classify the objects or events where all required data is known, such as the aggregation of mileage of frequent flyers.

When millions of customers require multiple classifications, from personal title to gender to frequent shopper, flyer or guest type, to state or province and country codes, classification information (reference data) requires zero defects.

What do you think? Let me know at

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access