This article is excerpted from "Taking Data Quality to the Enterprise through Data Governance," a report published in April 2006 by The Data Warehousing Institute (TDWI). To download the complete report, visit: www.tdwi.org/research.

When making a case for a data quality initiative or project, organizations cite both liability and leverage. They need to reduce costs by alleviating the liabilities of poor-quality data or they want to increase revenue by leveraging the benefits of high-quality data. Either way, the case can be compelling, such that most organizations claim a return on investments (ROI) in data quality.

Problems of Poor-Quality Data

In the surveys of 2001 and 2005, TDWI asked, "Has your company suffered losses, problems or costs due to poor quality data?" Respondents answering yes grew from 44 percent in 2001 to 53 percent in 2005, which suggests that data quality problems are getting worse.

In the same period, however, respondents admitting that they "haven't studied the issue" dropped from 43 percent to 36 percent. It is possible that the two trends cancel each other out, such that problems have not necessarily increased. Rather, more organizations now know from their own study that data quality problems are real and quantifiable. Averaging the two years together, 48.5 percent (or roughly half) of organizations now recognize the problem. Because this is far higher than the 12 percent denying any problem, we conclude that problems due to poor-quality data are tangible across all industries and exist in quantity and severity sufficient to merit corrective attention.

Poor-quality data creates problems on both sides of the fence between IT and business. Some problems are mostly technical in nature, such as extra time required for reconciling data (85 percent) or delays in deploying new systems (52 percent). Other problems are closer to business issues, such as customer dissatisfaction (69 percent), compliance problems (39 percent) and revenue loss (35 percent). Poor-quality data can even cause problems with costs (67 percent) and credibility (77 percent).

Origins of Poor-Quality Data

Survey responses show that problems unquestionably exist. But exactly where do they come from?

Problems originate in both IT and the business (see Figure 1). Problems arise from technical issues (conversion projects, 46 percent; system errors, 25 percent), business processes (employee data entry, 75 percent; user expectations, 40 percent) and a mix of both (inconsistent terms, 75 percent). Problems even come from outside (customer data entry, 26 percent; external data, 38 percent). Hence, data quality is assaulted from all quarters, requiring great diligence from both IT and the business to keep its problems at bay, with both internal processes and external interactions.

Figure 1: Origins of Poor-Quality Data


Inconsistent data definition is a leading origin of data quality problems. Too often, the data itself is not wrong; it is just used wrongly. For example, multiple systems may each have a unique way of representing a customer. Application developers, integration specialists and knowledge workers regularly struggle to learn which representation is best for a given use. When good data is referenced wrongly, it can mislead business processes and corrupt databases downstream. With 75 percent of survey respondents pointing to this problem, it ties with data entry as the most common origin of data quality problems.

Data entry ties for worst place as an origin of data quality problems. This problem has been with us since the dawn of computing and is probably here to stay. The problem is lessened by user interfaces that require as little typing as possible, validation and cleansing prior to committing entered data, training for users, regular data audits and incentives for users to get it right.

Data representing certain business entities, such as customer and product,  are more prone to data quality problems than data about other entities, such as finances or employees (see Figure 2).

Figure 2: Types of Data Prone to Quality Problems


Data about customers is the leading offender (74 percent). The state of customer data changes constantly as customers run up bills, pay bills, move to new addresses, change their names, get new phone numbers, change jobs, get raises, have children and so on. The customer is the most highly changeable entity in most organizations, along with equivalents such as the patient in health care, the citizen in government and the prospect in sales force automation. Unfortunately, every change is an opportunity for data to be entered incorrectly or to go out of date. Because customer data is often strewn across multiple systems, synchronizing it and resolving conflicting values are common data quality tasks, too.

Product data (43 percent) is in a distant second place after customer data. Defining product is challenging because it can take different forms, for example, as supplies that a manufacturer procures to assemble a larger product, the larger product produced by the manufacturer, products traveling through distribution channels and products available through a wholesaler or retailer. Note that this list constitutes a supply chain. In other organizations, the chain is not apparent; they simply acquire office supplies, medical supplies, military munitions and so on, which are consumed in the production of a service. Hence, one of the greatest challenges to assuring the quality of product data is to first define what "product" means in an organization.

Benefits of High-Quality Data

Roughly half of respondents reported they "haven't studied the issue" of data quality benefits (49 percent in Figure 3), whereas the study shows that only one-third haven't studied its problems. With more time spent studying problems instead of benefits, data quality is clearly driven more by liability than leverage. Even so, benefits exist, and 41 percent claim to have derived them, compared to a mere 10 percent denying any benefit.

Figure 3: Awareness of Benefits from High-Quality Data


The top three benefits of high-quality data identified by respondents all relate directly to data warehousing (see Figure 4), namely greater confidence in analytic systems (76 percent), less time spent reconciling data (70 percent) and a single version of the truth (69 percent). This is expected because data quality has a track record of success in data warehousing. Other benefits are more business driven, such as gains in customer satisfaction (57 percent), cost reduction (56 percent) and extra revenues (30 percent).

Figure 4: Benefits of High-Quality Data


Data Quality ROI and Budget

TDWI's 2005 survey asked, "Does your company believe it can achieve a positive return on investment by investing in a data quality initiative?" (see Figure 5). Forty-three percent of respondents reported that their organization believes ROI is possible, whereas 19 percent do not. Thirty-eight percent admit they do not know. This is similar to the response given when TDWI asked this question in 2001 - 40 percent, 19 percent and 41 percent, respectively. Based on the respondents' appraisal, ROI is a distinct possibility with data quality, though not an overwhelming probability.


Figure 5: Confidence in Data Quality Initiatives


Consistent with the recognized possibility of data quality ROI, a combined 80 percent of respondents report that data quality budgets will stay the same or increase, versus a miniscule four percent anticipating a budget cut (see Figure 6). Some interviewees described their data quality initiative or team as a cost center, though it is in transition toward becoming a revenue center. Given users' growing budgets and belief that ROI is possible, investments in data quality are safe, growing and likely to yield a return in a reasonable amount of time.

Figure 6: Anticipated Budget Changes for Data Quality Initiatives


The liabilities of poor-quality data and the leveragability of high-quality data should compel anyone to action. Organizations that depend on their data cannot afford to ignore its quality. Furthermore, data quality efforts are likely to yield a demonstrable return, and your peers in other organizations are increasing investments accordingly. Bottom line: you should, too. 

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access