column

Rethinking the Dimensions of Data Quality

JAN 22, 2013 8:55am ET
Print
Reprints
Email

A few months ago, I wrote a column asking if the dimensions of data quality, such as accuracy, consistency and timeliness, are real. I pointed out that there are no generally accepted definitions for the dimensions, no generally accepted exhaustive list of them and no generally accepted methodologies for measuring each one.

Get access to this article and thousands more...

All Information Management articles are archived after 7 days. REGISTER NOW for unlimited access to all recently archived articles, as well as thousands of searchable stories. Registered Members also gain access to:

  • Full access to information-management.com including all searchable archived content
  • Exclusive E-Newsletters delivering the latest headlines to your inbox
  • Access to White Papers, Web Seminars, and Blog Discussions
  • Discounts to upcoming conferences & events
  • Uninterrupted access to all sponsored content, and MORE!

Already Registered?

Comments (7)
Great post Malcolm. Something most organizations should be thinking about - data quality beyond merely accuracy and completeness, and how to assess all these dimensions. So it should come as no surprise to you that Gartner's leading data quality expert Ted Friedman and I developed a comprehensive toolkit for assessing over a dozen different data quality dimensions, including how to quantify them and track their improvement/degradation over time. Here is an overview, with full access for Gartner clients: http://www.gartner.com/resId=2171520 . --Doug Laney, VP Research, Gartner, @doug_laney
Posted by Douglas L | Tuesday, January 22 2013 at 11:25AM ET
Hi Malcolm; Interesting post, but I fear you may have muddied the waters...What DQ practitioners need to begin thinking about is what I call elements, associations and narratives.

Your date example is a good place to start. A date element can either be valid or not. 01-31-13 is valid; 04-31-13 is not. All of the elements used in an enterprise setting - regardless their use in associations - have to pass that test. "John O'Gorman" is a valid element as well. BTW, the element level makes no distinction between entities, attributes or properties, etc. any more than a periodic table makes a distinction between the Carbon in flour and the Carbon in graphite.

Associations must pass the same test. Elements may be valid, but their association may not. If my birthday is January 13, 1913 then the association between "John O'Gorman" and 01-31-13 is accurate. Any other value in the Date of Birth field - with the notable exception of equivalent (Jan-13-1913) values - is not accurate.

Finally, the information I can put together in an accurate extension of associations must be directly derived from the 'facts': "As of January 10th John O'Gorman is an nonagenerian within a few days of his one hundredth birthday. As a valued customer we should send him a box of Cuban cigars to honour his accomplishment."

Abstracting things like 'accuracy' can only be done in the context of the value stream, with 'value' being assessed at every step.

Posted by John O | Tuesday, January 22 2013 at 1:59PM ET
Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.