Recently, I was engaged in a conversation with a client to determine which data sets were subject to our next round of data quality profiling and assessment. After a successful first round where we identified serious data problems that were associated with the inability to successfully deploy a client analytic function, our goal was to expand the assessment process to look for more opportunities for system improvement. When we came to the subject of one particular data set and application, my client suggested that it was not useful to do any assessment on that data because that system was already recognized as being faulty and had reached the end of its useful life. The replacement system was already being designed.

After our conversation, my client's comments about the system that was to be retired kept floating through my head. Because there was already a recognition that the system was flawed, that there were problems with the data and that the system needed to be replaced, I wondered if it would be prudent to isolate the data problems with the old system to make sure they don't afflict the new system.

It occurred to me that there is actually significant value in performing data quality assessment on the data associated with an application that is nearing the end of its product life cycle, particularly with respect to a system whose failures have already been documented. The assessment process provides us with hard data about the gaps and costs that exist in the old system that are, in some respect, attributable to poor data quality. These operational costs, along with analyzing the corresponding business effects associated with system shortcomings, expose the true economic costs of poor data quality.

Let's take a step backward, because embedded within this argument is the "holy grail" of the data quality world – the data quality return on investment (ROI). Articulating the elusive ROI calculation for poor data quality lies in the ability to specifically document costs or economic effects and tie them directly to the particular data flaws. Identifying those most critical flaws (in the economic sense) as the replacement system is being designed provides an opportunity that is quite rare – the ability to architect data quality validation and monitoring into a system from its beginning, instead of as an afterthought (which is usually how poor data quality is addressed).

The pivotal point here is a double task consisting of the technical assessment of the data coupled with the business analysis of the results to match the problem to the specific data problems and then associate the economic impact with each problem. This means profiling the data and providing assessment reports for the business partner and together analyzing the assessment. The result should be a spreadsheet that details each occurrence of a data quality issue, where in the business process that problem was manifested, and the costs associated with that problem, which can either be internal costs (such as problem detection, correction or associated rework) or external costs (such as missing revenues, inability to collect receivables, lost customers, etc.).

The data in this spreadsheet can then point to the most critical types of data quality problems that exist within the system's data. The next step is to determine the costs associated with the mechanics for identifying flawed data as it moves through the new system and to either automatically correct the problem (if that is possible) or to prevent invalid data from propagating through to create new costs. As a part of this step, the business clients must also determine what degree of poor data the environment can support. For example, if the business clients can tolerate flawed data 10 percent of the time, that 10 percent must be incorporated into the growing ROI spreadsheet and applied as a "damper" to the ultimate data quality costs.

The resulting spreadsheet will then indicate what kind of return on investment can be achieved from building data quality validation into the system. For example, if the costs associated with poor data quality are $1,000,000 per year, but the cost to build data validation into the system is $100,000, there is a tenfold return on the investment after the first year, with a more significant return each subsequent year. On the other hand, it is also extremely useful to know if it cannot be determined that there is a significant return so that money is not wasted building unnecessary infrastructure.

After relating my thoughts to the client about the value of assessing the quality of data of systems in their golden years, the client agreed that there was value. As a result of that conversation, the client is now reviewing that value proposition within the organization to determine how many places exist where this process can be applied!

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access