"The bitterness of poor quality," runs an often-cited anonymous quotation, "remains long after the sweetness of meeting the schedule has been forgotten." Although quality discussions usually arise primarily in project or product manufacturing environments, it is actually poor quality data that can be particularly troublesome for an organization because its bitterness can affect the "taste" of everything else.
Understanding data quality issues is often a matter of finding a middle ground between those who define quality so broadly as to be unhelpful and those who define it too narrowly. Consider, for example, a data warehouse team whose quality concerns are sometimes limited to whether a data warehouse accurately reflects the contents of source systems. That's not going to satisfy most users (especially if the source systems themselves have quality issues). However, users often cast the quality net too broadly, expecting data to be there to support anything they want and to provide answers to questions they conceive of only dimly. Is there a middle ground here?
The Dimensions of Data Quality
Traditionally, data quality has been seen as encompassing four dimensions:
Accuracy addresses each data item within its context. (For example, is the ZIP code for an address correct?)
Availability is about whether users can access information when it's needed not just whether the data is there at the time needed, but also if the available data is as timely as it needs to be.
Completeness addresses both the question of whether an individual data item such as ZIP code is fully populated (does every address have a ZIP code?) and whether the available base of data contains all the data elements needed. (It's not enough to have a ZIP code if a user really needs an exact mailing address including apartment number.)
Consistency concerns information from separate sources that may be correct within its own context, but is inconsistent when viewed across multiple domains. (For example, different product sales databases often reflect different customer numbers for the same customer.)
Going Beyond the Common Wisdom
In fact, those four dimensions do not alone constitute a rich enough understanding of data quality. In today's more complex business environment of distributed information, data quality is not simply a technical matter, but is more often a business issue. A broad program to improve data quality may well reach into every business department and may require reevaluation of long-standing policies and procedures. If you create information, you also create downstream dependence on it and expectations throughout your organization that your information is of high quality, especially information regarding customers, suppliers and other outside entities. Is there a richer understanding of data quality that can help here?
The Total Data Quality Management Institute (TDQM) at the Massachusetts Institute of Technology (MIT) has developed a more comprehensive list of the characteristics of data quality (see sidebar). However, the list may seem daunting. How can a company use this broader understanding to improve the quality of its data in a comprehensive way?
|Data Quality Metrics1 || |
Establishing a Baseline
The answer is to begin with a comprehensive assessment of the current quality of your data. This does two things. First, it creates a baseline against which progress can be measured. Second, it establishes success measures. Both of these are vital to establishing your priorities as you proceed. You want to make sure you're using your resources to tackle problems that are most important to the business. Here are the subsequent steps to follow.
Step 1: Conduct an assessment across a broad spectrum of users, measuring both perceived importance of the metric and perceived quality of the metric. Doctors Huang, Lee and Wang, of MIT's TDQM, provide a sample assessment in their text, Quality Information and Knowledge. The sample assessment addresses only one information subject at a time; but a broad assessment needs to cover all major information subjects and result in a prioritization of which areas have more urgent issues to be tackled. The resulting assessment provides a baseline for measuring improvement over time and for defining success measures.
Step 2: Using the assessment results, focus on projects that will address areas with high scores for importance and low scores for quality. Define the success measures.
Step 3: Begin a project with a hypothesis about the cause of each significant quality issue. Then do a root-cause analysis to test the hypothesis, followed by a cost/benefit analysis to ensure that the problem can be fixed cost-effectively. Then you can create a detailed implementation plan to fix the problem.
Step 4: Reassess the information subject in 12 to 18 months to measure progress in fixing the data quality problems.
This is easier said than done, to be sure. However, following these steps carefully can help your organization ensure that your data doesn't leave you with a bitter taste in your mouth.
A postscript: This is my last column with DM Review. I have retired after 25 years with a great company, Accenture. My colleagues will continue sharing business intelligence experiences garnered from work with hundreds of clients and will have many helpful insights. (See Shari Rogalski's column, Business Intelligence: 360º Insight.) It has been a pleasure sharing my thoughts and experiences with you for the last 18 months.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access