for Information Management Blogs
DEC 1, 2009 5:01am ET

Blogroll

You Build it, You Break It, You Fix It: Why Applications Must Be Responsible for Data Quality

Print
Reprints
Email

When it comes to bad data, a lot of the problem stems from companies letting their developers off the hook. That’s right. When it comes to delivering, maintaining, and justifying their code, developers are given a lot of rope. When projects start, everyone nods their head in agreement when data quality comes up. But then there’s scope creep and sizing mistakes, and projects run long.

People start looking for things to remove. And writing error detection and correction code is not only complicated, it’s not sexy. It’s like writing documentation; no one wants to do it because it’s detailed and time consuming. This is the finish work: it’s the fancy veneer, the polished trim, and the paint color. Software vendors get this. If a data entry error shows up in a demo or a software review, it could make or break that product’s reputation. When was the last time any Windows product let you save a file with an invalid name? It doesn’t happen. The last thing a Word user needs is to sweat blood over a document and then never be able to open it again because it was named with an untypeable character.

Error detection and correction code are core aspects of development and require rigorous review.  Accurate data isn’t just a business requirement—it’s common sense. Users shouldn’t have to explain to developers why inaccurate values aren’t allowed. Do you think that the business users at Amazon.com had to tell their developers that “The Moon” was an invalid delivery address?  But all too often developers don’t think they have any responsibility for data entry errors.   

When a system creates data, and when that data leaves that system, the data should be checked and corrected.  Bad data should be viewed as a hazardous material that should not be transported. The moment you generate data, you have the implicit responsibility to establish its accuracy and integrity.  Distributing good data to your competitors is unacceptable;  distributing bad data to your team is irresponsible. And when bad data is ignored, it’s negligence.

While everyone—my staff members, included—wants to talk about data governance, policy-making, and executive councils, it all starts with bad data being input into systems in the first place.  So, what if we fixed it at the beginning?

Evan Levy also blogs at evanjlevy.com.

Filed under:

Advertisement

Comments (12)
Crazy talk! LALALA - I'm not listening!

This is a hard-sell to management (to include developers in any discussions about data quality) because 'they should have thought of that in development'. That developers have incomplete or unforeseen requirements should be taken as a given--but they're not. I've seen low-level software bugs fester for literally years before the BI system finally got enough backing to have the developers go back and work the problem.

There is a common perception that, since BI folks typically work with data from different sources that need a certain amount of cleansing/transformation in order to be used properly, they can take bad application data and just fix it. What's lost in the discussion is that 1) Data is best fixed closest to the source, and 2) The time/effort/headaches spent in coding around a bug are much better spent fixing the problem now and not letting it fester over a period of time.

I know in these times of doing more with less, managers typically take the path of least resistance--but in the long run, IMHO, your time is better spent in taking your medicine now, and continually reaping the benefits down the road.

My 2 cents. Excellent post! CH

Posted by Charles H | Tuesday, December 01 2009 at 12:53PM ET
It is not JUST the application code (and coders) that is the problem. We have sophisicated relational database systems that are frequently used as if they were file systems.

When Codd defined what an RDMBS must do in order to merit the name, it included the use of constraints--these included PK, Not Null, FK, Check, domain, and conditional (i.e.: triggers). In the 15 yrs I've been architecting databases, I've had to fight to include constraints in database designs--With DB2, that includes having to fight for PKs and true "Not Nulls" (i.e.--not using the @#$% system defaults on every column).

Properly designed RDBs that fully use the capabilities of an RDBMS can provide an amazing lift in preventing bad data, but I've found that selling that to IT directors is a constant uphill battle.

You're absolutely right--Data Quality frequently only gets lip service from IT

Posted by Leighton L | Tuesday, December 01 2009 at 1:03PM ET
Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.

Blog Archive for Evan Levy

The Time Has Come for Enterprise Search
The Problem with Total Cost of Ownership
Complex Event Processing: Challenging Real-Time ETL
The Flaw of the Data Inventory
So You Think You’re Ready for a Data Warehouse Appliance, Part 2

More from Evan Levy »

Blog Index »

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.