OCDQ Blog
for Information Management Blogs
AUG 21, 2012 9:13am ET

Blogroll

blog

Relationship Advice for Data Quality

Print
Reprints
Email

In his recent Information Management column, Malcolm Chisholm wrote that data quality is not fitness for use as it is most commonly defined, stating he thinks “a strong case can be made that the definition is indeed inappropriate and should be replaced with a better one.”

“Before we get into the definition of data quality, let us take a brief look at what data is related to,” Chisholm opened, explaining that “data represents something — a thing, event, or concept.”

As I blogged in my post “Plato’s Data,” whether it’s an abstract description of real-world entities (i.e., “master data”) or an abstract description of real-world interactions (i.e., “transaction data”) among entities, data is an abstract description of reality.  Although data shapes our perception of the real world, sometimes we forget that data is only a partial reflection of reality.

“Data is understood,” Chisholm continued, “by something, for which the best term I can find is the interpretant.”

“The interpretant applies the data to one or more uses, which achieve objectives the interpretant has.  The interpretant is independent of the data.  It understands the data and can put it to use.  But if the interpretant misunderstands the data, or puts it to an inappropriate use, that is hardly the fault of the data, and cannot constitute a data quality problem.”

As I blogged in my post “Quality is the Higgs Field of Data,” independent from use, data is as carefree as the mass-less photon whizzing around at the speed of light. But once we interact with it, data begins to feel the effects of our use. We give data mass so that it can become the basic building blocks of what matters to us.  Some data is affected more by our use than others.  The more subjective our use, the more we weigh data down.  The more objective our use, the less we weigh data down.

“A more fundamental problem is that data can have many uses,” Chisholm continued.  “If we think data quality is fitness for use, then data quality must be assessed independently for each use we put it to.”  Instead, Chisholm contends that data quality is “an expression of the relationship between the thing, event, or concept and the data that represents it.  This is a one-to-one relationship, unlike the one-to-many relationship between data and uses.”

Therefore, Chisholm proposes that a better definition of data quality is “the extent to which the data actually represents what it purports to represent.  This definition can be used to think of data quality as a property of the data itself, and then our diagnosis and remediation efforts will focus on the special problems of the relationship between data and what it represents.”

But, of course, although Chisholm doesn’t like it as a definition for data quality, he is not denying that fitness for use describes “a set of valid concepts that deal with types of problems around the use of data.” Two examples he cites are when the interpretant misunderstands the data, or when the interpretant uses data for a purpose that is incompatible with the data.

In his conclusion, Chisholm states that “the special problems of the relationships between data and what it is used for requires a different set of approaches and should be called something other than data quality.”

And this is exactly why, as I blogged in my post “Data Myopia and Business Relativity,” many data professionals prefer to define data quality as real-world alignment and information quality as fitness for the purpose of use.  However, I have found that adding the nuance of data versus information only further complicates data quality discussions with business professionals.

Chisholm also suggests that his proposed definition of data quality is not only better, but that “it also alludes to the existence of metadata that links the data to what it is representing.”  The important role that metadata plays in supporting data and its relationships with information and quality is something I blogged about in my post “You Say Potato and I Say Tater Tot.”

The irony is the metadata that links the data management industry to what it is representing that it manages suffers from the one-too-many relationships we’ve created by seemingly never agreeing on how data, information, and quality should be defined.

This post originally appeared at OCDQ Blog.

Advertisement

Comments (4)
Why do we need a definition for data quality? Consider this thought from Sir Karl R. Popper from Martin J. Eppler:

"I do not say that definitions may not have a role to play in connection with certain problems, but I do say it is for most problems quite irrelevant whether a term can be defined (or not). All that is necessary is that we make ourselves understood."

While data may seem readily definable, I increasingly concur with Robert Pirsig that quality is not. He believes: "everyone knows what it is but no one can define it."

Posted by Peter P | Wednesday, August 22 2012 at 11:29AM ET
In all my many long years in IT (evienced by the receding hair line and the greying of what is left) I have been passionate and still am passionate about the data AND its meaning.

How about using the terms "syntactic DQ" for what is now commonly referred as DQ (with emphasis on "fitness for use") and "semantic DQ" (for when the data has precise meaning/semantics)?

Surely, when the data has high "semantic DQ", use by the Interpretants wont be wrong.

- Madani B (Sydney)

Posted by Madani B | Wednesday, August 22 2012 at 11:17PM ET
Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.

Blog Archive for Jim Harris

Pondering a Big Data Philosophy
Galileo, the Hubble and Clear Data Insight
When Poor Data Quality Lands on the Ledger
Poor Data Quality That Kills
Data Quality and the OK Plateau

More from Jim Harris »

Blog Index »

Where do young IT professionals (30 and under) obtain information to aid with daily role responsibilities and career development?

Trade publication websites 14%
Social media 23%
Vendor websites 4%
Vendor/community forums 7%
Newsletters 1%
Trade conferences/meetups 2%
RSS feeds 6%
Web search 44%

 

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.