People start looking for things to remove. And writing error detection and correction code is not only complicated, it’s not sexy. It’s like writing documentation; no one wants to do it because it’s detailed and time consuming. This is the finish work: it’s the fancy veneer, the polished trim, and the paint color. Software vendors get this. If a data entry error shows up in a demo or a software review, it could make or break that product’s reputation. When was the last time any Windows product let you save a file with an invalid name? It doesn’t happen. The last thing a Word user needs is to sweat blood over a document and then never be able to open it again because it was named with an untypeable character.
Error detection and correction code are core aspects of development and require rigorous review. Accurate data isn’t just a business requirement—it’s common sense. Users shouldn’t have to explain to developers why inaccurate values aren’t allowed. Do you think that the business users at Amazon.com had to tell their developers that “The Moon” was an invalid delivery address? But all too often developers don’t think they have any responsibility for data entry errors.
When a system creates data, and when that data leaves that system, the data should be checked and corrected. Bad data should be viewed as a hazardous material that should not be transported. The moment you generate data, you have the implicit responsibility to establish its accuracy and integrity. Distributing good data to your competitors is unacceptable; distributing bad data to your team is irresponsible. And when bad data is ignored, it’s negligence.
While everyone—my staff members, included—wants to talk about data governance, policy-making, and executive councils, it all starts with bad data being input into systems in the first place. So, what if we fixed it at the beginning?
Evan Levy also blogs at evanjlevy.com.











This is a hard-sell to management (to include developers in any discussions about data quality) because 'they should have thought of that in development'. That developers have incomplete or unforeseen requirements should be taken as a given--but they're not. I've seen low-level software bugs fester for literally years before the BI system finally got enough backing to have the developers go back and work the problem.
There is a common perception that, since BI folks typically work with data from different sources that need a certain amount of cleansing/transformation in order to be used properly, they can take bad application data and just fix it. What's lost in the discussion is that 1) Data is best fixed closest to the source, and 2) The time/effort/headaches spent in coding around a bug are much better spent fixing the problem now and not letting it fester over a period of time.
I know in these times of doing more with less, managers typically take the path of least resistance--but in the long run, IMHO, your time is better spent in taking your medicine now, and continually reaping the benefits down the road.
My 2 cents. Excellent post! CH
When Codd defined what an RDMBS must do in order to merit the name, it included the use of constraints--these included PK, Not Null, FK, Check, domain, and conditional (i.e.: triggers). In the 15 yrs I've been architecting databases, I've had to fight to include constraints in database designs--With DB2, that includes having to fight for PKs and true "Not Nulls" (i.e.--not using the @#$% system defaults on every column).
Properly designed RDBs that fully use the capabilities of an RDBMS can provide an amazing lift in preventing bad data, but I've found that selling that to IT directors is a constant uphill battle.
You're absolutely right--Data Quality frequently only gets lip service from IT