When making a case for a data quality initiative or project, organizations cite both liability and leverage. They need to reduce costs by alleviating the liabilities of poor-quality data or they want to increase revenue by leveraging the benefits of high-quality data. Either way, the case can be compelling, such that most organizations claim a return on investments (ROI) in data quality.
Problems of Poor-Quality Data
In the surveys of 2001 and 2005, TDWI asked, "Has your company suffered losses, problems or costs due to poor quality data?" Respondents answering yes grew from 44 percent in 2001 to 53 percent in 2005, which suggests that data quality problems are getting worse.
In the same period, however, respondents admitting that they "haven't studied the issue" dropped from 43 percent to 36 percent. It is possible that the two trends cancel each other out, such that problems have not necessarily increased. Rather, more organizations now know from their own study that data quality problems are real and quantifiable. Averaging the two years together, 48.5 percent (or roughly half) of organizations now recognize the problem. Because this is far higher than the 12 percent denying any problem, we conclude that problems due to poor-quality data are tangible across all industries and exist in quantity and severity sufficient to merit corrective attention.
Poor-quality data creates problems on both sides of the fence between IT and business. Some problems are mostly technical in nature, such as extra time required for reconciling data (85 percent) or delays in deploying new systems (52 percent). Other problems are closer to business issues, such as customer dissatisfaction (69 percent), compliance problems (39 percent) and revenue loss (35 percent). Poor-quality data can even cause problems with costs (67 percent) and credibility (77 percent).
Origins of Poor-Quality Data
Survey responses show that problems unquestionably exist. But exactly where do they come from?
Problems originate in both IT and the business (see Figure 1). Problems arise from technical issues (conversion projects, 46 percent; system errors, 25 percent), business processes (employee data entry, 75 percent; user expectations, 40 percent) and a mix of both (inconsistent terms, 75 percent). Problems even come from outside (customer data entry, 26 percent; external data, 38 percent). Hence, data quality is assaulted from all quarters, requiring great diligence from both IT and the business to keep its problems at bay, with both internal processes and external interactions.
Figure 1: Origins of Poor-Quality Data
Inconsistent data definition is a leading origin of data quality problems. Too often, the data itself is not wrong; it is just used wrongly. For example, multiple systems may each have a unique way of representing a customer. Application developers, integration specialists and knowledge workers regularly struggle to learn which representation is best for a given use. When good data is referenced wrongly, it can mislead business processes and corrupt databases downstream. With 75 percent of survey respondents pointing to this problem, it ties with data entry as the most common origin of data quality problems.
Data entry ties for worst place as an origin of data quality problems. This problem has been with us since the dawn of computing and is probably here to stay. The problem is lessened by user interfaces that require as little typing as possible, validation and cleansing prior to committing entered data, training for users, regular data audits and incentives for users to get it right.
Data representing certain business entities, such as customer and product, are more prone to data quality problems than data about other entities, such as finances or employees (see Figure 2).
Figure 2: Types of Data Prone to Quality Problems
Data about customers is the leading offender (74 percent). The state of customer data changes constantly as customers run up bills, pay bills, move to new addresses, change their names, get new phone numbers, change jobs, get raises, have children and so on. The customer is the most highly changeable entity in most organizations, along with equivalents such as the patient in health care, the citizen in government and the prospect in sales force automation. Unfortunately, every change is an opportunity for data to be entered incorrectly or to go out of date. Because customer data is often strewn across multiple systems, synchronizing it and resolving conflicting values are common data quality tasks, too.
Product data (43 percent) is in a distant second place after customer data. Defining product is challenging because it can take different forms, for example, as supplies that a manufacturer procures to assemble a larger product, the larger product produced by the manufacturer, products traveling through distribution channels and products available through a wholesaler or retailer. Note that this list constitutes a supply chain. In other organizations, the chain is not apparent; they simply acquire office supplies, medical supplies, military munitions and so on, which are consumed in the production of a service. Hence, one of the greatest challenges to assuring the quality of product data is to first define what "product" means in an organization.
Benefits of High-Quality Data
Roughly half of respondents reported they "haven't studied the issue" of data quality benefits (49 percent in Figure 3), whereas the study shows that only one-third haven't studied its problems. With more time spent studying problems instead of benefits, data quality is clearly driven more by liability than leverage. Even so, benefits exist, and 41 percent claim to have derived them, compared to a mere 10 percent denying any benefit.