Continue in 2 seconds

Developing Information Quality Metrics

  • May 01 2005, 1:00am EDT
More in

Currently in vogue is the ability to summarize an organization's "business productivity" to senior managers using pithy representations that are expected to carry deep meaning and, at the same time, reduce the attention required to absorb that meaning. Business productivity management systems engage key performance indicators whose values are posted to executive dashboards for the CEO's periodic (be it daily or hourly) review. The intention of these applications is to provide a presentation of the current state of the environment in the context of reasonable expectations. In other words, a business manager wants to have an overview of the "value creation" of the entire system, much the same way a nuclear engineer gauges different metrics associated with the safety status of the nuclear reactors.

In most areas of a business, the metrics that back up the key performance indicators may be relatively straightforward. For example, in a shoe factory, one might gauge the number of shoes coming off the production line, the rate at which shoes are being produced, the number of flawed shoes coming off the line or the number of accidents that occur each day. Each of these metrics may be represented using various visual cues, each of which provides a warning when the performance indicator reaches some critical level.

When it comes to the world of information quality, however, the analogy seems to break down, mostly because there is a disconnect between what is apparently measurable and what the value of that measurement means. For example, one may count the number of times a value is missing from a specific column in a specific table, but in the absence of any business context, it is not clear how those missing values affect the business, or if they even affect the business at all.

Yet we all know that poor data quality does affect the business. Thus, there should be some kind of performance indicator that can capture and summarize the relationship between data that does not meet one's expectations and the organizational bottom line. The challenge, then, is to devise a strategy for identifying and managing "business-relevant" information quality metrics.

What Makes a Good Metric?

More challenging, however, is that the individuals typically tasked with devising good information quality metrics are better trained at data analysis and less skilled in business performance monitoring. Therefore, part of this strategy is to understand the characteristics of a reasonable business performance metric and then explore how to map those characteristics to the measurable aspects of data quality. The following list of characteristics, which is by no means complete, should give some guidance as to how to jump-start the strategy:

  • Clarity of definition
  • Measurability
  • Business relevance
  • Controllability
  • Representation
  • Reportability
  • Trackability
  • Drill-down capability

Clarity of Definition

Because the metric is intended to convey a particular piece of information regarding an aspect of business performance in a summarized manner, it is critical that its underlying definition be stated in a way that clearly explains what is being measured. In fact, each metric should be subject to a rigorous "standardization" process in which the key stakeholders participate in its definition and agree to the definition's final wording. In addition, it is advisable to provide the metric's value range, as well as a qualitative segmentation of the value range that relates the metric's score to its performance assessment.


Any metric must be measurable and should be quantifiable within a discrete range. Note, however, that there are many things that can be measured that may not translate into useful metrics, and that implies the need for business relevance.

Business Relevance

The metric is of no value if it cannot be related to some aspect of business operations or performance. Therefore, every desirable metric must be defined within a business context with an explanation of how the metric score correlates with a measurement of performance. More desirable is if performance measurement can be directly associated with a critical business impact; this is probably the most critical characteristic of a data quality metric.


Any measurable characteristic of information that is suitable as a metric should reflect some controllable aspect of the business. In other words, the assessment of an information quality metric's value within an undesirable range should trigger some action to improve the data being measured.


Without digressing into a discussion about the plethora of visual "widgets" that can be used to represent a metric's value, it is reasonable to note that one should associate a visual representation for each metric that logically presents the metric's value in a concise and meaningful way.


From a different point of view, each metric's definition should provide enough information that can be summarized as a line item in a comprehensive report. The difference between representation and reportability is that the representation will focus on the specific metric in isolation, while the reporting should show each metric's contribution to an aggregate assessment. In turn, this allows the manager to evaluate the priority of any issues needing resolution.


A major benefit of metrics is the ability to measure performance improvement over time. Tracking performance over time not only validates any improvement efforts, but once an information process is presumed to be stable, tracking provides insight into maintaining statistical control. In turn, these kinds of metrics can evolve from performance indicators into standard monitors, placed in the background to notify the right individuals when the data quality measurements suddenly indicate a deviation from expected control bounds.

Drill-Down Capability

In recognition of the summarization aspect of a representation of a data quality metric, the flip side is the ability to provide exposure to the underlying data that contributed to a particular metric score. The natural instinct, when reviewing data quality measurements, is to review the data instances that contributed to any low scores. The ability to drill down through the performance metric allows an analyst to get a better understanding of patterns (if any exist) that may have contributed to a low score, and consequently use that understanding for a more comprehensive root-cause analysis. This kind of insight allows your organization to isolate the processing stage at which any flaws are introduced and, in turn, enables you to eliminate the source of the introduction of data problems (instead of the typical, counterproductive reaction of correcting the data values themselves).

Measurements of Data Quality

The conventional wisdom for measuring data quality relies on quantifying how data sets relate to "dimensions of data quality." Those dimensions, including (among others) accuracy, completeness, consistency, timeliness and currency, are useful for discussing ways that data values exhibit quality within their information context. Some of these measurements are relatively easy to capture, such as data value completeness (which, for data in a structured relational database, is trivially done using simple SQL queries), while others might require more dedicated resources, such as the manual review necessary to determine accuracy.

Unfortunately, it is easy to confuse a measurement for a metric. Generating a count of the number of missing data values is easy, but does it truly reflect the characteristics of a data quality metric? In fact, outside of the business context, one may not be able to answer that question. However, by identifying and placing that measurement within the appropriate business context, one may evolve a measurement into a metric. To do this, one must identify the "relevance" of any measurement in terms of the business impact associated with what is being measured.

Finding Business Relevance

We can divide the set of all of your organization's information flaws into two groups - those that impact the achievement of the business' operational and strategic goals, and those that do not. For all intents and purposes, we can ignore those flaws that do not have any impact. Associating a specific information flaw to a specific business impact may be hard work, but it is not an impossible task. First, identify the areas of business impact. Then, for each perceived data quality problem, break the process down into these subtasks:

  1. Review how that data flaw relates to each area of impact.
  2. Determine the frequency with which impact is incurred.
  3. Sum up the measurable costs associated with each impact incurred by the data quality problem.
  4. Assign an average cost to each occurrence of the problem.

For example, let's presume that 10 percent of a company's shipping addresses have some problem and that 20 percent of the time that an item with a flawed shipping address is shipped, it is returned - incurring an additional cost of $10.00. This means that two percent of the shipping addresses may incur an additional $10.00 cost. Therefore, if the company ships 1,000 packages per day, the data quality flaw costs $200.00 per day. If the company manages 100,000 shipping address records, then 10,000 of them are flawed, and ultimately, the average cost per day of each occurrence of a flawed address is $0.02.
This is relatively simplistic, and this example is contrived to demonstrate an approach to attaching business value to each instance of a problem. In turn, this evaluation allows us to turn a measurement (of a flawed address) into a metric because the number of occurrences of the flaw is directly associated with business relevance.
There are four general areas of business impact that can be associated with data quality problems:

  • Productivity
  • Profit
  • Risk
  • Intangibles


Productivity can be assessed in terms of physical production (i.e., the number of usable components coming off the production line) or in terms of individual production (i.e., how much time someone is spending on a task). While physical productivity is easy to measure, personal productivity is less so. Yet, the frequently referenced data quality costs incurred due to "scrap and rework" are most often attributable to individual productivity, in which we accumulate the number of hours that a person spends identifying that a problem exists, tracking down its source, rewinding any tasks that were performed using the incorrect data, and re-running the processing. The cost of that problem instance is stated in terms of that person's fully loaded cost per hour multiplied by the number of hours spent addressing the problem.


Simply, an organization's profit is based on how much money it takes in (revenue) minus the amount of money it spends (expenses). A data quality problem can be associated with increased costs as well as decreased revenue. In addition to any increased costs associated with reduced productivity, a problem's impacts may have ramifications further down the knowledge chain. For example, when one organization released a report with inconsistencies, there were additional costs associated with recalling and destroying the distributed (hard-copy) documents, and producing and distributing a corrected version. Events such as these are usually tracked within a company; therefore, it should be relatively easy to accumulate costs and statistics and, again, isolate an average cost to each data quality incident.

The more insidious impacts are ones that result in lost revenue. For example, component pricing information on the supplier side is likely to be integrated into a final product's pricing strategy; inaccurate data on the supplier side may result in underpricing the final product, which in turn reduces the margin for each product sold. Costs are calculated as the sum of the difference in margins presuming the data quality flaw had not existed.
Another example is the concept of the "lost customer." There are two kinds of lost customers: parties within your information domain who are understood to not be current customers, although more accurate analysis of the data would fully indicate that they are, and parties who are understood to be current customers, but in fact have been subject to attrition. Both of these kinds of lost customers incur profit impact, either through increased marketing costs for current customers or decreased sales to ex-customers.


There are many different forms of risk, and each can be used as the business basis for a data quality metric. Regulatory risks are associated with noncompliance with legal imperatives, such as statutes, laws, regulations, etc. Investment risks are associated with increasing the value of your assets. Development risks are associated with capital investment in the development of systems intended to improve the business operations. One might even incorporate credit risk into the stable of risks related to poor data quality.

While it is difficult to assign a "precise" value to each of these risks, there are algorithms for assigning some value to each. Even in the absence of a precise measurement, there are strata of quantifications (e.g., high, medium, low) that can be measured and represented.


Poor data quality can impact organizations in intangible ways as well. Examples include customer satisfaction, public relations and goodwill. Each of these areas can be measured, and in cases where bad data clearly affects that measurement, one can configure a corresponding metric.

Direct and Subsidiary Metrics

We have looked at some areas of business relevance and how to relate the measurement of the number of data flaws to those areas - we can refer to these as direct metrics. More interestingly, there are subsidiary (and possibly, more useful) metrics that can be created from this process as well. For example, we looked at how to assign an average cost to each occurrence of an unexpected data value, and we can provide periodic reports and ongoing tracking of that metric. The subsidiary metric is to review how data quality improvement reduces that average cost per occurrence over time. As a different example, we have metrics that we track over time, and we can see how improvement is made over time. The subsidiary metric reports the rate at which improvement is made. Both of these examples are used to provide insight into how effective the improvement program is overall.

The Data Quality Dashboard

Lastly, an issue to consider incorporates the reporting and presentation of these metrics to the business partner. This presentation, which we can accumulate into a "dashboard," would provide visual representation conveying the business relevance of each metric, as well as provide access to its definition. More importantly, the dashboard provides access to the more important aspects: trackability and drill-down.

The tracking component would provide a visual graph over the periods that the metric is measured and allow the knowledge-worker the opportunity to review the details of any specific period's measurements. The drill-down capability would allow the analyst to access the data underlying the metric and review those data instances contributing to the measurement, which enables more comprehensive review as well as root cause analysis.

The Challenge

Developing key performance indicators for information quality is clearly a challenge, mostly because the hard numbers presented by data quality tools are typically out of the business context. Here we have provided some insight into how the data analyst can work with the business customer to identify ways that poor data quality impacts the achievement of business objectives and subsequently determine hard costs associated with each occurrence of a flaw. Once this has been done, providing a dashboard with tracking and drill-down capabilities establishes a value-added approach for value-directed information quality management and improvement.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access