How do you measure/calculate information quality quotient for a particular data set?
Information Management Online, March 7, 2008
Question: How do you measure/calculate information quality quotient for a particular data set (i.e., a single value of quality for a data set)?
Danette McGilvrays Answer: The good news is that information quality can be measured - via various data quality dimensions. A data quality dimension is a way to measure and manage data and information quality. I reference 12 data quality dimensions that I believe are the most practical and useful for the business to measure and manage. Some of the dimensions are data integrity fundamentals, duplication, accuracy, consistency and synchronization, timeliness and availability and data coverage. There is no industry standard for data quality dimensions. Choose those dimensions most applicable to your situation.
Advertisement
There is not room in this forum to describe each of the 12 dimensions. But let me point you to one of the first dimensions to measure - that of data integrity fundamentals. The dimension of data integrity fundamentals is a measure of the existence, validity, structure, content and other basic characteristics of data. This dimension includes essential measures of completeness/fill rate, validity, frequency distributions and lists of values, patterns, maximum and minimum values, referential integrity, etc. All other dimensions build on this dimension whether you are assessing your data for the first time to prepare source-to-target mappings, using assessment results to clean source data or develop transformation rules or monitoring the data regularly within your production environment.
To assess data integrity fundamentals you will need to profile your data. Profiling can be accomplished by using one of the profiling tools available on the market. Profiling tools are sometimes referred to as analysis or discovery tools and provide the most comprehensive information about your data. You can also use other tools to profile your data such as using SQL to write queries, some type of report writer to create ad hoc reports or a statistical analysis tool. A caution just having a tool is not the full answer. You need to make sure the processes around using the tool are also well planned and implemented.
If you are looking for a single data quality indicator, you will need to measure various dimensions important to you and combine them into a data quality index. The index is a single indicator, which is actually a compilation of several measures. Of course, any data quality results that are reported should be explained so those utilizing the reports understand what is being measured.
Additional information on data integrity fundamentals, profiling tools and the other data quality dimensions can be found in my upcoming book Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information available summer 2008.
Danette McGilvray is president and principal of Granite Falls Consulting, Inc., a firm specializing in information quality management to support key business processes around customer satisfaction, decision support and operational excellence. Projects include enterprise data integration programs, data warehousing strategies and best practices for large-scale ERP data migrations for Fortune 50 organizations. For more than ten years she led information quality initiatives at Hewlett-Packard and Agilent Technologies. An accomplished program manager and facilitator, she is an internationally respected expert on data profiling, metrics, quality, audits, benchmarking, and tool acquisition and implementation. McGilvray is an invited speaker at conferences throughout the U.S. and Europe, where she trains other industry experts in enterprise information management and data stewardship. You can reach her at danette@gfalls.com.
For more information on related topics, visit the following channels:






