The following is excerpted from "The Overall Approach to Data Quality ROI" by William McKnight. For a copy of the full paper, please visit or www.mcknight- Steps 1 through 3 were covered in McKnight's August column.

Step 4: Data Quality Scoring

When the systems and the data quality rules are identified and the data is characterized, the data quality must be scored. Scoring will represent the state of the data quality for that rule. System scores will be an aggregate of the rule scores for that system, and the overall score will be a prorated aggregation of the system scores.

Scoring is a relative measure of conformance to rules. For a given rule, it could be as simple as the percentage of opportunities for rule enforcement that are positively met: adherence/possibilities.

For example, if 94 percent of genders conform to the desired values, as in the Step 3 example (see www.mcknight-, that score is 94 percent.

Because many of the results will (hopefully) be above 99 percent, to provide a higher level of granularity to the scoring, you may set a floor at 50 (or any number) and measure the adherence above that number. For example, using the gender example, the score would be (94-50)/50 or 88 percent instead of 94 percent.

Averaging all the data quality scores from the system results in an overall system score (see Figure 1). Rule score average = DQ Score for system = 94.375%.

Figure 1: Rule Differences Example

Simple averaging may not be as effective as a weighting system that gives more weight to the more important scores. Regardless, I suggest the scoring of a system be set such that it approximates that shown in Figure 2.

Figure 2: Data Quality Scoring

Step 5: Measure Impact of Various Levels of Data Quality

ROI is about accumulating all returns and investments in the chain from the project build, maintenance and associated business and IT activities through to the ultimate desired result, considering the possible outcomes and their likelihood. Using ROI for justification is reducing the proposed net change in activities to their associated anticipated cash flow.

Various ROIs can be computed in anticipation of a project for its justification, showing the various potential outcomes distributed across their probability of occurrence ­ a probability distribution for the project. The variables include numerous critical components, including data quality. The appendix of the paper shares some industry-specific examples of projects and how data quality might affect outcomes.

All things being equal, the data quality scores of a system will lead to different system results and hence different ROIs.

In our targeted marketing example, it will be impossible to mail to bad addresses, and we will get returned mail at incorrect addresses. More importantly, we will segment customers inappropriately and, therefore, market inappropriately using poor quality data. Bad product references in sales lead to incorrect profiling of sales habits. Incorrect customer state assignment or income level assignment also leads to incorrect profiling and lower returns from marketing efforts.

The measurement of all of these quality rules is the data quality score for the system. The data quality rules were arrived at not by intellectually determining how the data should look, but by determining the cost to the function of the system if the data lacked quality.

For example, if the data quality score for the targeted marketing database is 85, we can expect to achieve a three percent return on our marketing programs, which leads to a 115 percent ROI. A score of 80 underachieves the potential return and yields 2.5 percent return on marketing dollar which leads to a 2.5 percent marketing return or 80 percent ROI. A score of 90 produces a higher achieving targeted marketing program, perhaps a four percent marketing return and 145 percent ROI (see Figure 3).

Figure 3: Example Data Quality Scores

It behooves us to improve the quality of the data to improve the anticipated return on the project, but at what cost? The Data Quality Improvement Step (Step 6) details the data quality actions and costs them out to arrive at the ideal data quality level. Details regarding Step 6 can be found in the complete publication (www.mcknight-

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access