Continue in 2 seconds

The Costs of Information and Data Quality Defects

  • May 01 2003, 1:00am EDT
More in

Calculating the costs of information and data quality defects is important in assessing the priority of quality improvement initiatives. The costs of poor quality information line up with the three main categories of defects: representational, procedural and judgmental.

Representational information quality refers to how objectively the IT system aligns with the business reality it is supposed to present. Poor quality information and defective data do not line up with reality. Redundant data storage and processing costs are often cited as the paradigm of representational costs within the IT data center. If the system contains a date field that is ambiguous due to an incomplete representation of the data (e.g., a missing century), one obvious cost is that of fixing the system to accurately represent the context. The Y2K data element reportedly required approximately $600 billion to correct. Rework, digital scrap and inefficiencies in data center operations are significant contributors to these costs.

Naturally, the cost and impact of poor data quality varies widely by industry, by company within industry, by business process and by how one partitions the problem. Still, some generalizations are possible and useful to consider. Costs of data creation and maintenance are relatively easy to capture but are incomplete. Much of the value of data escapes narrow cost analysis and becomes visible when lack of a data backup means the company is out of business. In calculating costs, it is essential to look beyond the representational issues and include the consequences of the inaccurate information. Hence, it is necessary to look at the procedural and judgmental aspects of defective data.

If an input file is loaded twice into the same database (double posted due to a faulty procedure), the consequences can be very costly in terms of the effort to recover the database as well as its unavailability during the recovery period. If it is an order-entry database, the firm may effectively be "out of business" during such a period. For firms that move physical goods through a supply chain, the costs of a returned package due to inaccurate shipping data are the paradigm. If the information is unusable because of the way it is presented at the GUI, the cost of the staff time in designing a work-around and the extra time thus incurred on a daily basis is chargeable to the poor information quality. In an extreme case at NASA, the cost of a data quality defect was the entire mission when data was entered as English, not metric measurements, causing the $100 million spacecraft to crash.

Costly consequences of inaccurate data abound. Management must make decisions based on inaccurate information and may not even know it. After enough representational and procedural defects have occurred, the trustworthiness and history of experiences with the system in question reach a critical mass where the system is not credible. If the information quality issues are chronic, the customer will take the business elsewhere. Even if there is no product or service substitute, the customer may document the complaints to the responsible regulatory agencies, resulting in increased customer service costs and costly investigations or regulatory proceedings.

If clients receive inaccurate billing statements, the cost is delayed collections. Customers generally will not pay an inaccurate or unintelligible bill. In addition, the enterprise will incur increased customer service costs as clients make inquiries about the statements and initiate disputes. If enough payments are delayed, the firm might have to draw down its short-term credit, resulting in additional interest expenses. This would be directly traceable to the inaccuracies and is quite tangible.

In general, the cost and impact is greater if the firm does more business with a given account or customer. Naturally, the cost and impact is greater if the firm generates an error that occurs across the board with all the customers or a large subset of them. The value of a million-dollar error with one customer and a $1 error with a million customers is the same. However, the coordination costs in correcting the million-person error are greater if it is necessary to communicate with each person individually. To measure the cost of a customer lost due to data defects, a measure of the lifetime value of the customer (or account) is needed. This would require aggregating a lifetime of customer transactions. For those firms that have consolidated, integrated customer history in a data warehouse, this is feasible.

When examining the costly consequences of data and information defects, remember the story "For Want of a Nail" from the days when horses and riders were used to deliver messages. For want of a nail, the horse lost its shoe, the message was not delivered, the battle was lost, the empire fell and the king was beheaded. That's a severe consequence, and all for want of a nail. Likewise, small data quality errors can sometimes produce results that are disproportionate to their size. Therefore, exercise caution in making hasty generalizations about the value of a single data element. Caution is also appropriate in guarding against data quality paranoia. If a negative scenario is highly improbable (so to speak, an uninsurable risk in a given market), management may be justified in omitting it from the design. However, this decision should be made with eyes open, mindful of possible consequences.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access