Question: How do you measure/calculate information quality quotient for a particular data set (i.e., a single value of quality for a data set)?
Danette McGilvrays Answer: The good news is that information quality can be measured - via various data quality dimensions. A data quality dimension is a way to measure and manage data and information quality. I reference 12 data quality dimensions that I believe are the most practical and useful for the business to measure and manage. Some of the dimensions are data integrity fundamentals, duplication, accuracy, consistency and synchronization, timeliness and availability and data coverage. There is no industry standard for data quality dimensions. Choose those dimensions most applicable to your situation.
There is not room in this forum to describe each of the 12 dimensions. But let me point you to one of the first dimensions to measure - that of data integrity fundamentals. The dimension of data integrity fundamentals is a measure of the existence, validity, structure, content and other basic characteristics of data. This dimension includes essential measures of completeness/fill rate, validity, frequency distributions and lists of values, patterns, maximum and minimum values, referential integrity, etc. All other dimensions build on this dimension whether you are assessing your data for the first time to prepare source-to-target mappings, using assessment results to clean source data or develop transformation rules or monitoring the data regularly within your production environment.
To assess data integrity fundamentals you will need to profile your data. Profiling can be accomplished by using one of the profiling tools available on the market. Profiling tools are sometimes referred to as analysis or discovery tools and provide the most comprehensive information about your data. You can also use other tools to profile your data such as using SQL to write queries, some type of report writer to create ad hoc reports or a statistical analysis tool. A caution just having a tool is not the full answer. You need to make sure the processes around using the tool are also well planned and implemented.
If you are looking for a single data quality indicator, you will need to measure various dimensions important to you and combine them into a data quality index. The index is a single indicator, which is actually a compilation of several measures. Of course, any data quality results that are reported should be explained so those utilizing the reports understand what is being measured.
Additional information on data integrity fundamentals, profiling tools and the other data quality dimensions can be found in my upcoming book Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information available summer 2008.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access