OCDQ Blog
for Information Management Blogs
OCT 27, 2011 10:26am ET

Blogroll

The Metadata Crisis

Print
Reprints
Email

I am reading the book “The Information: A History, a Theory, a Flood” by James Gleick, which recounts a dialogue written by the ancient Chinese philosopher Gongsun Long known as When a White Horse is Not a Horse:

“Horses certainly have color. Hence, there are white horses. If it were the case that horses had no color, there would simply be horses, and then how could one select a white horse? And so it follows that a horse and a white horse are different. Hence, I say that a white horse is not a horse.

Furthermore, a white horse is a horse and white, but horse is that by means of which one names the shape, and white is that by means of which one names the color. What names the color is not what names the shape. Hence, I say that a white horse is not a horse.”

“On its face, this is unfathomable,” explained Gleick, “but it begins to come into focus as a statement about language and logic. Paradoxes like this formed part of what Chinese historians called the language crisis, a running debate over the nature of language. Names are not the things they name.”

One of my favorite topics is how data is not the real world it describes.  But perhaps a better data management example of how “names are not the things they name” is metadata, which Julie Hunt blogged about in her post “Stumbling Over Metadata,” which explored better definitions than the oversimplified “metadata is data about data.”

Metadata can be thought of as a label that provides a definition, description, and context for data. Common examples include relational table definitions and flat file layouts. More detailed examples of metadata include conceptual and logical data models.

Therefore, metadata — among its many other uses — often plays an integral role in determining your data usage. Although it’s often overlooked, there is a strong relationship between metadata and data quality, and by extension, between metadata and data-driven decision making, since a business intelligence report’s metadata often provides the framing effect for its data.

I have often witnessed what could be called the metadata crisis, a running debate within many organizations over the meaning of commonly used terms like revenue, which complicates what on the surface seem like straightforward business questions, such as how much revenue was generated during a particular fiscal reporting period.

A metadata management version of When a White Horse is Not a Horse might be When Recognized Revenue is Not Revenue.

However, the complexities of revenue recognition probably pale in comparison with the metadata crisis that can be caused by what David Loshin calls the most dangerous question in data management: What is the definition of customer?

What examples of the metadata crisis have you encountered in your organization?

This blog originally appeared at OCDQblog.com.

Advertisement

Comments (10)
Revenue recognition debate has a framework of GAAP, income statement vs ROI vs cash flow to guide the debate. I don't think it is correct to call the presence of a debate on revenue recognition a meta data crisis.

We need similar framework for managing customer data like an asset. Then we can debate the definition of customer depending on the purpose and we will have made significant progress. Starting with "what is defintion of customer?" I think that is putting the white before the horse.

Posted by Ed U | Thursday, October 27 2011 at 11:50AM ET
This is an excellent topic, Jim, and one close to my practice. There is, however, another more systemic problem in play here (if that's possible) and that is the nature of the medium in which we are trying to solve these problems. You mention a number of the technologies (relational databases, file hierarchies and metadata) that are essentially set-based and therefore 'flat' trying to manage something (semantics) that is by its nature multi-dimensional.

Moving away from the 'flat' has large implictions, one of which is that any organization can have - and should manage - multiple dialects. By that I mean, in the dialect of accounting 'customer' means some agent who has contributed to increased sales. In the dialect of marketing 'customer' can mean anyone with a pulse that will sit and listen to a pitch.

This insistence on a single version of anything - embedded in controlled vocabularies, relational tables, object classes or a folder structure is the the single largest impediment to cleaning up the digital wasteland.

Posted by John O | Friday, October 28 2011 at 11:05AM ET
Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.

Blog Archive for Jim Harris

When Can Worse Data Quality Be Better?
Pondering a Big Data Philosophy
Galileo, the Hubble and Clear Data Insight
When Poor Data Quality Lands on the Ledger
Poor Data Quality That Kills

More from Jim Harris »

Blog Index »

Where do young IT professionals (30 and under) obtain information to aid with daily role responsibilities and career development?

Trade publication websites 14%
Social media 23%
Vendor websites 4%
Vendor/community forums 7%
Newsletters 1%
Trade conferences/meetups 2%
RSS feeds 6%
Web search 44%

 

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.