for Information Management Blogs
AUG 25, 2009 3:29am ET

Blogroll

Perfect Data and Other Data Quality Myths

Print
Reprints
Email
A recent client experience reminds me what I’ve always said about data quality: it isn’t the same as data perfection. After all, how could it be? A lot of people think that correcting data is a post-facto activity based on opinion and anecdotal problems. But it should be an entrenched process.

One drop of motor oil can pollute 25 quarts of drinking water. But it’s not the same with data. On the other hand, an average of less than 75 insect fragments per 50 grams of wheat flour is acceptable. (Jill says this is “apocryphal,” but you get my point.)

People forget that the definition of data quality is data that’s fit for purpose. It conforms to requirements. You only have to look back at the work of Philip Crosby and W. Edwards Demming to understand that quality is about conformance to requirements. We need to understand the variance between the data as it exists and its acceptability, not its perfection.

The reason data quality gets so much attention is when bad data gets in the way of getting the job done. If I want to send an e-mail to 10,000 customers and one customer’s zip code is unknown, it doesn’t prevent me from contacting the other 9999 customers. That can amount to what in any CMO’s estimation is a very successful marketing campaign. The question should be: What data helps us get the job done?

Our client is a regional bank that has retained Baseline to work with its call center staff. Customer service reps (CSRs) have been frustrated that they get multiple records for the same customer. They had to jump through hoops to find the right data, often while the customer waited on the phone, or on-line. The problem wasn’t that the data was “bad”—it was that the CSRs could only use the customer’s phone number to look up the record. If the phone number was incorrect, the CSR can’t do her job. And as a result, her compensation suffers. So data quality is very important to her. And to the bank at large.

Users are all too accustomed to complaining about data. The goal of data quality should be continuous improvement, ensuring a process is available to fix data when it’s broken. If you want to address data quality, focus energy on the repair process. As long as your business is changing—and I hope it is—its data will continue to change. Data requirements, measurements, and the reference points for acceptability will keep changing too. If you’re involved in a data quality program, think of it as job security.
Filed under:

Advertisement

Comments (6)
There is a debate to be had on whether the bigger contributor for DQ issues is erroneous data entry or multiple instances of data creation. Eg. A credit scoring system scores a customer. Which is used in a loan system, whihc feeds to some other system. When the Credit scoring system revalues teh customer it is not cascaded down because the business process does not force the issue. As a result, Credit Score is created/generated/mainatined in 3 places. The chances are all three systems validate allowable values for Credit Scores. But they are not consistent withe each other.
Posted by Muralidharan G | Tuesday, September 01 2009 at 5:18AM ET
I have found that showing the consequences of poor data quality to users is the best way to get their attention on this issue. When they see the report that shows 20% of sales in the "category not found" category, they get the point pretty quickly. Either they find this acceptable ("oh yeah, we know about that problem"), they blame IT ("but can't you just develop some magical procedure that determines the category from the existing mess of data?"), or they recognize the need for process improvement ("how do we fix this?"). If the users are OK with the data as is, then there's no point in going to extraordinary measures to clean the data. As Evan discusses in this article, acceptability does not necessarily mean perfection. Of course, it's not always easy to explain the data quality issues to users, but DQ is ultimately a business problem, not an IT problem.
Posted by David K | Tuesday, September 01 2009 at 4:26PM ET
Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.

Blog Archive for Evan Levy

The Time Has Come for Enterprise Search
The Problem with Total Cost of Ownership
Complex Event Processing: Challenging Real-Time ETL
The Flaw of the Data Inventory
So You Think You’re Ready for a Data Warehouse Appliance, Part 2

More from Evan Levy »

Blog Index »

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.