We were recently asked to evaluate a large data set for data value anomalies as part of an overall data quality assessment that we hoped would establish the business case for senior management's investment in a data quality program. The particular data set we were examining contained a table with address information, and the model reflected a typical instantiation of an address table, with fields named AddrLn1, AddrLn2, City, State, ZipCode, ZipFour and Country. The client's data model had been purchased within the past 10 years.

This table was used to generate addresses for mail correspondence between my client and their customers. As can be expected, one of the more relevant anomalies focused on the difference between a mailing address that could have been composed from the corresponding data elements and the standard form for composing addresses specified by the U.S. Postal Service (USPS). While I am usually adamant about the fact that "data quality is more than just names and addresses," I do have to point out that there is a relevant data quality problem embedded within this form of table, and it is not address standardization. Instead, it is that the use of two fields to capture a mailing address allows data entry personnel to input information that can subsequently get "lost" between those two fields.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access