Examining Dimensions of Data Quality: Reasonability, Time and Access
The first article in this series clarified what areas of agreement exist for three of the dimensions of data quality (accuracy, precision, consistency) between six of the DQ industry’s authorities. This article addresses the reasonability, time and access aspects of data quality.
Because the last article discussed Consistency, a natural place to continue is the related area of Reasonableness, which is often confused with Consistency. The following authors espouse Reasonableness or Believability.
When we look closer, however, only two authors (Loshin and Lee et al.) identify a new concept not already covered. The DMBOK and Loshin identify consistency of values, which we covered in the last article. Loshin’s identification of the time-related aspect of reasonability is spot on, but I’d classify that as simply a domain of acceptable values (though date constrained), which we will cover in the Validity dimension in the fourth article in this series. Rational expectations, which are labeled “reasonable,” can also be documented as validity ranges, minimums, maximums and other basic business rules. By documenting these requirements as rules used during profiling, the properties of the data can be measured and managed in an unbiased way.
Lee et al. cite believability as “regarded as true and credible,” but that is very subjective and not a property of the data as much as an opinion of its fitness for use by consumers.
As discussed at the beginning of this series, dimensions are properties of the data relative to its fitness, and we’ve either placed these three concepts (Temporal Reasonability, Meets Rational Expectations or Regarded as True and Credible) in other dimensions or dismissed them as not meetings the criteria of a dimension because it isn’t a property of the data. That being said, surveying end users’ opinions of data desirability is valuable in the context of bigger data quality improvement, but doesn’t fit within the scope of the dimensions of data quality because they are attributes of the customer’s need, not inherent attributes of the data.
There is much more agreement regarding the next dimensions that we’ll cover. All of the authors espouse the Timeliness dimension.
At first glance one may think that Timeliness and Currency are the same concept, but that isn’t the case. Currency focuses on how up-to-date or how “fresh” data is, reflecting the real-world concept. Timeliness is related to how quickly a stakeholder can gain access to the data needed. An example of this might be when a data mart is loaded with daily granularity sales data once a month, meaning that users can create daily purchase reports but there is a one-month lag between the day that the report represents and the earliest day it can be viewed in the data mart.
Lee et al. call out the Appropriate Amount of Data as well, but that is only a volume metric within the Accessibility concept. In addition to Currency, some authors cite the “Concurrence of Distributed Data” concept, as seen in Table 4.
Within Timeliness, there is an additional concept of Retention that only the TDWI references. This is especially important to records coordinators within compliance and legal functions who require that documents are properly disposed of after a set period of time.
The next article in this series looks at Completeness, which I believe is the most fundamental place to start a data quality effort.
- Redman, Tom. "Data Quality: The Field Guide," Digital Press 2001.
- English, Larry. "Information Quality Applied," Wiley Publishing, 2009.
- TDWI. "Data Quality Fundamentals," The Data Warehousing Institute, 2011.
- DAMA International. "The DAMA Guide to The Data Management Body of Knowledge" (DAMA-DMBOK Guide) Technics Publications, LLC, 2009.
- Loshin, David. "The Practitioner's Guide to Data Quality Improvement," Elsevier 2011.
- Yang W. Lee, Leo L. Pipino, James D. Funk, Richard Y. Wang. "Journey to Data Quality," MIT Press 2006.
- McGilvray, Danette. "Executing Data Quality Projects- Ten Steps to Quality Data and Trusted Information," Morgan Kaufmann, 2008.