The Data Quality Crisis

If the state of quality of your company's products and services was the same level of quality as the data in its databases, would your company survive or go out of business? Conduct a quick self-assessment test. Please answer honestly these questions:

  1. Are there managed processes to control the replication and synchronization of data between "corporate" databases or files and the data that resides in private, proprietary databases and files in non-shareable departmental computers and on personal computers in your enterprise. In addition, how many of these proprietary databases and files exist in your enterprise? This includes data in spreadsheets, PC databases and even in word-processor files. If you are typical, you cannot even estimate the number. One insurance company had a list of twelve "sacred data elements" that were considered so important that if the data was wrong, the company could fail. When it did a data inventory, it discovered that this data element was maintained in 43 separate databases by 43 independent applications with data entered by 43 different data producers. One manufacturing firm had 92 "part" files, many defined with different primary identifiers so that the same part in different files could not even be cross- referenced. A major bank had 251 different customer files that it had to analyze just to answer the question, "Who is our best customer?" But topping the list is the consumer goods company that discovered they had over 400 "brand" files containing information about their products!
  2. If the data in those "corporate" databases is high quality, why is there a need for all of the redundant, private databases that seem to multiply like rabbits? After all data is the only business resource that is completely reusable. Money can be spent only once. Employees can be assigned to only one task at a time. Raw materials can be used once in the production of a finished good. And facilities can be used for only one purpose at a given point in time. Yet data, the only non-consumable resource, is the only resource where high redundancy is "accepted" as a perceived cost of doing business. The insurance company with 43 different databases and applications capturing the same fact is the information equivalent of accounts payable paying a single invoice 43 times or human resources hiring 43 people to perform the same task 43 different times. Is this the legacy Information Systems should provide its enterprise?

The dark side of the business case for data warehousing is the failure of operational applications to provide for effective data management of the business-critical information resource. And the enterprise is paying dearly for this.

Why Data Warehouses Fail

Many see data warehousing as the silver bullet out of the operational data abyss. If data warehousing is approached with the same information (mis-)management principles that have produced the dis-integrated islands of automation legacy environment, it will fail. And it will fail spectacularly. And it will deserve to fail.

Data warehousing projects fail for many reasons. All of the reasons can be traced to a single element: non-quality. Poor data architecture, inconsistently defined departmental data, inability to relate data from different data sources, missing and inaccurate data values, inconsistent uses of data fields, unacceptable query performance (timeliness of data), lack of a business sponsor (no data warehouse customer), etc., are all components of non-quality data.

With all of the emphasis on the technologies of data warehousing, it serves one well to remember two things:

  • The product of the data warehouse is information.
  • The customers of the data warehouse are the knowledge workers who must make increasingly more important decisions faster than ever before.

If the data warehouse does not deliver reliable information that supports the customers' decisions and strategic processes to their satisfaction, history will repeat itself.

But Our Data Quality Is Not So Bad . . .

After all, the operational processes are running well. That may be, with an emphasis on "may." The truth of the matter is that the tactical and strategic process requirements of data warehouse data are completely different from the operational process requirements of data. Consider the following:

An insurance company downloaded claims data to its data warehouse to analyze its risks based upon the medical diagnosis code for which claims were paid. The data revealed that 80 percent of the claims paid out of one claims processing center were paid for a diagnosis of "broken leg." Their concern was, "What is happening here? Are we in a really rough neighborhood?" What they discovered was that the claims processors were paid for how fast they could pay a claim. So they let the system default to "broken leg." The data quality was good enough to pay a claim. All the claims payment system needed was a valid diagnosis code. But that same data was totally useless for risk analysis.

But worse than this is the fact that over the years, the archaic legacy data structures have failed to keep up with the information requirements of even the operational knowledge workers. As a result, because they require more and more information to do their jobs, knowledge workers have been forced to create their own data workarounds and store the data they need in creative ways that differ from the original file structure. This represents only the beginning of the data quality challenges facing the data warehouse team.

Why have these issues not been seriously addressed before? Two reasons. The first is that data quality is not a "sexy" topic. After all, who wants to work at the sewage treatment plant when they could be building factories? The second reason is the insidious one. Management has deemed the costs of the status quo and the current level of low quality data as acceptable.

The Incredible Costs

I believe this management acceptance of the costs of the status quo is a reluctant acceptance, because they believed there was no alternative. Most organizations have come to accept the level of non-quality data as normal and usual. After all, we are profitable, aren't we? As long as the level of data quality is relatively the same among the competition, the competitive battle lines are drawn in other areas. But when someone redefines the role of data quality, as the Japanese did with auto quality, the rules of the game change. The U.S. auto industry's Big Three (GM, Ford and Chrysler) are still losing ground over the past decade. Their domestic car market share fell from 76.0 percent in 1984 to 62.5 percent in 1996, an all-time annual low. January 1997 started out worse with the Big Three's domestic auto market share dipping to 59.3 percent, according to industry tracker Autodata.1 General Motors lost a whopping $4.5 billion in 1991 and followed with an incomprehensible $23.5 billion loss the next year, before they got their act together. While they have regained profitability, with a record $6.9 billion in 1995, their combined profits from the four years 1993-1996 have not erased the loss of 1992 alone. And GM's market share in the U.S. continues to slide from 34 percent in '92 to 31.6 percent in '96.2 GM stockholders can only speculate what their stock value might be today if the American auto manufacturers had not been oblivious to the quality revolution.

The quality revolution has redefined quality from an optional characteristic to a basic requirement for both goods and services. It is no longer sufficient to compete on price alone. Customer satisfaction is the key driver for long-term financial and organizational success today. GM's new CEO, Jack Smith, admonished employees in October 1996, "We cannot afford the luxury of complacency. Continuous improvement is the name of the game if we want to assure our jobs and the future of this great company."3

The message here is that the same kind of quality revolution in information quality will change the economic landscape. Continuous improvement of information products and services will become the name of the game if information professionals want to assure their jobs and the futures of their organizations. Those oblivious to its imminence will suffer. The question is only a matter of how much.

Management can no longer afford the luxury of the excessive costs of non-quality data. In the information age a quality, shared knowledge resource is the differentiator. Lack of quality information is to the next decade what product quality was to the 1980s.

Data Quality and the Bottom Line

Data quality problems hamper virtually every area of the business, from the mailroom to the executive office. Every hour the business spends hunting for missing data, correcting inaccurate data, working around data problems, scrambling to assemble information across dis-integrated databases, resolving data-related customer complaints, etc., is an hour of cost only, passed on in higher prices to the customer. That hour is not available for value-adding work. Senior executives at one large mail order company personally spend the equivalent of one full-time senior executive in reconciling conflicting departmental reports before submitting them to the chief executive officer. This means there is the equivalent of one senior executive redundantly required because of redundant and inconsistent (non- quality) data!

Bill Inmon observes that 80-90 percent of the human efforts in building a data warehouse are in handling the interface between operational and data warehouse environments.8 This effort is caused by not having an integrated data environment. This requires data warehouse professionals to map undefined and unintegrated data from disparate and redundant databases and files, standardize, de-dup redundant occurrences of data both within single files and across redundant files, integrate and consolidate data and format it into an integrated data warehouse data architecture. Well over half of these costs are attributable directly to non- quality data and non-quality data management and systems development practices.

The rise in popularity of departmental data marts, thrown together quickly without addressing the data integration issues, only exacerbates the already huge problem of non-quality data. For data warehousing projects to be successful, the organization must address the problem of non-integrated data head on.

The bottom line is that data quality problems hurt the bottom line.

Quality experts agree that the costs of non-quality are significant. Quality consultant Philip Crosby, author of Quality is Free, identifies the cost of non-quality to manufacturing as 15-20 percent of revenue.9 Juran pegs the costs of poor quality, including "custom complaints, product liability lawsuits, redoing defective work, products scrapped . . . in most companies they run at about 20 to 40 percent of sales."10 A. T. Kearney CEO Fred Steingraber confirms that "We have learned the hard way that the cost of poor quality is extremely high. We have learned that in manufacturing it is 25-30 percent of sales dollars and as much as 40 percent in the worst companies. Moreover, the service industry is not immune, as poor quality can amount to an increase of 40 percent of operating costs."11

But what about the costs of non-quality data? If early data assessments are an indicator, the business costs of non-quality data, including non-recoverable costs, rework of products and services, workarounds, lost and missed revenue, may be as high as 10-20 percent of revenue or total budget of an organization. Furthermore, as much as 40-50 percent of the typical IT budget may actually be spent in the equivalent of manufacturing's "scrap and rework."

Why Care About Data Quality?

Because the high cost of low data quality is enterprise-threatening.

There is and must be only one purpose for improving data quality: to improve customer and stakeholder satisfaction by increasing the efficiency and effectiveness of the business processes. Data quality is a business concern, and data quality improvement is a business issue.

For organizations in a competitive environment, data quality is a matter of survival and then of competitive advantage. For organizations in the public and not-for-profit sectors, data quality is a matter of survival and then of stewardship of stakeholder (taxpayer or contributor) resources.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access