As a child, I was fascinated by the concept of an atomic clock. It was precise and invariant - time could be absolutely measured against it. As an adult, I have often encountered business leaders who would like to measure their customer data against some absolute standard comparable to an atomic clock in its unvarying precision. These business leaders know that establishing clear data quality metrics and regular scorecard reporting will drive improvements in any customer data initiative. But there is no absolute standard for data quality, so how can you create appropriate metrics? What, in short, are you measuring relative to?

In the absence of the data quality equivalent of an atomic clock, there are three categories of relative-to metrics:

  1. Relative to a prior period. These are the easiest measures to create and to understand. They are extremely valuable for monitoring processes during data aggregation and transformation, so data management and IT organizations use relative-to-prior-period metrics for assessing operational quality. Sample metrics might include: The time to update address changes has gone from x last quarter to y this quarter. The percentage of inaccurate addresses in my monthly test sample has declined from x to y in the last six months. The number of legal name fields populated in my database has increased from x to y over the past year.
  2. Relative to an alternative source. Most customer data elements are collected in several places within a company. Many customer data elements are also available from external data sources. The second type of relative-to metrics evaluates the relative quality of these alternative sources. Once the best source for names, addresses, phone numbers, Standard Industrial Classification (SIC) codes, etc. has been determined, customer data systems can populate fields or direct applications to the premier source by data element. Sample metrics might include: the percentage of accurate addresses in test samples is higher in database A than in databases B through N. An external data source has x percent more SIC codes and y percent more legal names than the best internal database with those two fields.
  3. Relative to business user needs. This relative-to metric is the classic definition of quality: meets the requirements of the intended use. Relative-to-business-user-needs metrics are the hardest to define and measure, but they are the most important for the long-term success of the business. The metrics may be a function of the customer information needs of different users within the company. For example, sales administration might need accurate names, city- and state-level addresses and fully populated SIC fields for territory allocation and quota setting. They need this data to be as accurate and complete as possible once each year, and they do not want it to change until next year. Marketing, on the other hand, might care significantly less about name accuracy but much more about having timely information that reflects market potential as it exists today. Relative-to-business-user-needs metrics will drive core business objectives and should be driven by an analysis of core business processes. There should be only two or three metrics, and they should reflect the fact that perceived data quality will depend more on the intended use than on the raw data itself.

Although there is no absolute data quality standard - your choice of the appropriate relative-to metrics will be a function of where you are in the transformation from raw data to information for business purposes - there will be dimensions that are common to all three relative-to metrics. The dimensions that I have found most useful are timeliness, accuracy, completeness and consistency.

  1. Timeliness is the first on my list because the toughest data quality problems will probably be caused by the rapid rate of change for names, addresses, phone numbers, etc. So-called accuracy problems are often timeliness problems in disguise; e.g., a business moved and the database hasn’t been updated yet.
  2. Accuracy failures are any type of incorrect information, from data entry spelling errors to linking the wrong information to a customer. Accuracy failures are the bloopers that business users love to discover and discuss at length.
  3. Completeness has two dimensions: breadth - how many customers/prospects do you have information about? and depth - how much do you know about your customers and prospects?
  4. Consistency is having the same field definitions, naming conventions, country codes, capitalization and abbreviation rules, etc. for all instances of customer data.

The key to establishing the most valuable metrics is to create them so you arrive at the quality levels your business end users require. As an example, if you are measuring timeliness relative to a prior period, you will be focused on driving down the time it takes to add new data to your databases. When you are measuring relative to another source, you will assess which data sources have the most current information. But, note from the sales administration scenario in the relative-to-end-user-needs section just mentioned, not all business users will necessarily value timeliness and its associated expense. Before designing your relative-to-prior-period and relative-to-alternative-sources metrics, you need to understand how much value the ultimate users put on timeliness, accuracy, completeness and consistency at the data element level. It’s another Goldilocks’ proposition: you want your quality levels to fit your company’s needs “just right” - more will cost too much, and less will inhibit profitable revenue growth.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access