The Value of Using the Dimensions of Data Quality
The first five articles in this series contrasted the dimensions of data quality defined by six renowned authors - including valuable points from additional authors as applicable. Clearly there are many valuable aspects to the dimensions of data quality:
- The categorization of data by quality properties allows prospective consumers to evaluate whether the data meets their needs in terms of its current properties (completeness, precision, etc.).
- The categorization of data by quality properties provides a mechanism to prioritize data quality cleanup, process changes and implement data stewardship/governance.
- Dimensions (and, more specifically, the underlying concepts with the associated metrics) provide a method of measuring quality over time.
- The categorization of data by quality properties allows practitioners to predict business impact based on known behavior of each dimension of quality (e.g., lack of completeness yields understated financials, invalid values can lead to miscategorization or aggregation).
The purpose for having an industry-accepted set of dimensions with associated concepts is to allow organizations to effectively communicate internally and externally. In a more networked society, where there are more external demands on our data, such as governmental regulation, legal, security, corporate partnerships and corporate valuation, agreed-upon standards are a must.
In a recent discussion on this topic with data quality author Danette McGilvray, she pointed out that from an internal perspective, the quicker an organization can establish and start using these foundational dimensions, the sooner they will see the benefits. Why not get a jump-start using the industry standard and then add custom categories and concepts as needed?
Bringing it All Together
In this capstone article, I’ve compiled the proposed list of dimensions Figure 1 lists the dimensions identified by the data quality authors and associated concepts before standardization. Note the red arrows crossing the vertical dashed lines indicate where authors cited concepts within other dimensions. Using this charting method, the optimal relationship would have dimensions with underlying concepts only within each individual column — no red dashed arrows. (Click here to open Figure 1.)
Figure 1 lists concepts, independent of author. Table 1 provides a side-by-side comparison of the dimensions between authors, as covered in articles one through five of this series. (Click here to open Table 1.)
Someone will likely disagree with the way these have been conformed, but as everyone who participates in data governance knows, there has to be some compromise in order to create a standard. I think the following is palatable to most of the authors cited and true to the underlying reasons for each concept.
It should be noted, though, that this work has not taken into account the direct impact of unstructured data quality (e.g., textual documents, video, audio, etc.), and over time we’d expect that the number of concepts documented under these dimensions would grow and other dimensions will likely be introduced. The industry standard will likely be a living cannon of the agreed-upon dimensions.
The consolidated list of dimensions of data quality and underlying concepts, based on the consolidation in articles one through five, are listed in Table 2. (Click here to open Table 2.)
It should be noted that this is not a list of definitions of the dimensions, which would require an extensive review, negotiation and compromise effort among industry thought leadership. Rather, this is a conformed list of the underlying concepts for each dimension. (I am presenting this topic at the International Association for Information and Data Quality Conferences called IDQ 2013 in Little Rock, AR this November. I hope to see you there and discuss this topic further.)
In conclusion, I stress that although many of the dimensions put forth by data quality authors are good mechanisms to ensure quality information management work products, they aren’t specific to the quality of data and its intended use.
This is where we should go back to the standard definition for data quality: “Fitness for Use,” which is a misnomer. It should be “Fitness for intended use.” After all, we wouldn't say that a Ferrari is of poor quality when used off-roading, would we? Rather it is of exceptional quality for its purpose (aesthetic beauty, acceleration, high-speed maneuvering on flat surfaces, etc.). In terms of creating standards, the presumption has to be that the data is for a given purpose/audience, and then within that scope we can define whether it meets our needs or not.
Read the rest of this series:
Part 1: Dimensions of Data Quality Under the Microscope
Part 2: Examining Dimensions of Data Quality: Reasonability, Time and Access
Part 3: Examining Dimensions of Data Quality: Completeness
Part 4: Examining Dimensions of Data Quality: Validity and Integrity
Part 5: Examining Dimensions of Data Quality: Definition and Representation
All references to authors’ works in this series come from the following sources:
- Redman, Tom. Data Quality: The Field Guide, Digital Press 2001.
- English, Larry. Information Quality Applied, Wiley Publishing, 2009.
- TDWI, Data Quality Fundamentals, The Data Warehousing Institute, 2011.
- DAMA International. The DAMA Guide to The Data Management Body of Knowledge (DAMA-DMBOK Guide) Technics Publications, LLC, 2009.
- Loshin, David. The Practitioner's Guide to Data Quality Improvement, Elsevier 2011.
- Yang W. Lee, Leo L. Pipino, James D. Funk, Richard Y. Wang. Journey to Data Quality, MIT Press 2006.
- McGilvray, Danette. Executing Data Quality Projects- Ten Steps to Quality Data and Trusted Information, Morgan Kaufmann, 2008.