You should expect that when you first load data into your new data warehouse or data mart, chances are it will be of dubious quality. In fact, it may not be usable without a toxic data warning label. Examples of data quality problems, found after warehouses are loaded and knowledge workers query them, could provide enough laughs for a stand-up comedy routine. Consider, for example, the insurance company which found an unexpected number of hemorrhoid claims in a particular region. When asked why, clerks said they had no place to record "pain in the a--" customers, so they chose an appropriate claim code. The claims were paid as long as there was a valid claim code. "We had no idea anyone else was using that information," they said.

The solution most often prescribed for less-than-hoped-for data quality is a data cleanup effort, sometimes called data scrubbing. This approach involves a massive and ongoing effort to validate and correct data after you extract it from source systems and before you load it into your warehouse.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access