My recent blog post "The Data Quality Wager" received a provocative comment from Richard Ordowich that sparked another round of discussion and debate about proactive data quality versus reactive data quality in the LinkedIn Group for the IAIDQ.
“Data quality is a reactive practice,” explained Ordowich. “Perhaps that is not what is professed in the musings of others or the desired outcome, but it is nevertheless the current state of the best practices. Data profiling and data cleansing are after the fact data quality practices. The data is already defective. Proactive defect prevention requires a greater discipline and changes to organizational behavior that is not part of the current best practices. This I suggest is wishful thinking at this point in time.”
“How can data quality practices,” C. Lwanga Yonke responded, “that do not include proactive defect prevention (with the required discipline and changes to organizational behavior) be considered best practices? Seems to me a data quality program must include these proactive activities to be considered a best practice. And from what I see, there are many such programs out there. True, they are not the majority — but they do exist.”
After Ordowich requested real examples of proactive data quality practices, Jayson Alayay commented “I have implemented data quality using statistical process control techniques where expected volumes and ratios are predicted using forecasting models that self-adjust using historical trends. We receive an alert when significant deviations from forecast are detected. One of our overarching data quality goals is to detect a significant data issue as soon as it becomes detectable in the system.”
“It is possible,” replied Ordowich, “to estimate the probability of data errors in data sets based on the currency (freshness) and usage of the data. The problem is this process does not identify the specific instances of errors just the probability that an error may exist in the data set. These techniques only identify trends not specific instances of errors. These techniques do not predict the probability of a single instance data error that can wreak havoc. For example, the ratings of mortgages was a systemic problem, which data quality did not address. Yet the consequences were far and wide. Also these techniques do not predict systemic quality problems related to business policies and processes. As a result, their direct impact on the business is limited.”
“For as long as human hands key in data,” responded Alayay, “a data quality implementation to a great extent will be reactive. Improving data quality not only pertains to detection of defects, but also enhancement of content, e.g., address standardization, geocoding, application of rules and assumptions to replace missing values, etc. With so many factors in play, a real life example of a proactive data quality implementation that suits what you’re asking for may be hard to pinpoint. My opinion is that the implementation of ‘comprehensive’ data quality programs can have big rewards and big risks. One big risk is that it can slow time-to-market and kill innovation because otherwise talented people would be spending a significant amount of their time complying with rules and standards in the name of improving data quality.”
“When an organization embarks on a new project,” replied Ordowich, “at what point in the conversation is data quality discussed? How many marketing plans, new product development plans, or even software development plans have you seen include data quality? Data quality is not even an afterthought in most organizations, it is ignored. Data quality is not even in the vocabulary until a problem occurs. Data quality is not part of the culture or behaviors within most organizations.”
To take my poll on this, click here.