OCDQ Blog
for Information Management Blogs
NOV 10, 2011 4:13pm ET

Blogroll

You Say Potato and I Say Tater Tot

Print
Reprints
Email

One thread of the comment discussion on my blog post “The Metadata Continuum” raised the excellent point that the demarcation of the border between data and metadata is important, but sometimes difficult to discern. By extension, we can say the same thing about the demarcation of the border between data and information.

So, in this blog post, I thought I would try to offer an explanation about the importance of these demarcations using potatoes.

You Say Potato and I Say Potahto

Let’s Call the Whole Thing Off” was a song written by George Gershwin and Ira Gershwin, which became famous for its playful lyrics that poked fun at the differences in the pronunciation of words, such as “you say potato and I say potahto.”

Spelling and pronunciation are included in the dictionary definition of a word, which is a good example of one of the many uses of metadata, namely as a label that provides a definition, description, and context for data. Essentially, metadata describes data, and since data is attempting to describe a real world object, such as a potato, metadata is a further abstraction from reality.

And as we saw with the example of white horses in my blog post “The Metadata Crisis,” these abstract definitions can also include additional classifications (e.g., there are over 4,000 different varieties of potato), which also have to be well defined in order to facilitate clear communication and effective discussion. These levels of abstractions, definitions, and classifications are essential to our attempts to understand, and do business with, the real world. And this challenge continues even further with information.

You Say Potato and I Say Tater Tot

The difference, and relationship, between data and information is a common debate. Not only do these two terms have varying definitions, but they are often used interchangeably. Just a few examples include comparing and contrasting data quality with information quality, data management with information management, and data governance with information governance.

Some consider this an esoteric debate between data geeks and information nerds, but what is not debated is the importance of understanding how organizations use data and/or information to support their business activities.

Extending my analogy, data is like a potato and information is like a tater tot. In other words, information is one of the many possible specific uses for data. Information is one of the many possible specific things that we can make using data, which is why information quality professionals often speak about the information product.

So it’s important to remember that we can’t have a tater tot (information) without a potato (data), and that we can’t have either a tater tot or a potato without having a working definition (metadata) of what a potato is.

Let’s Not Call the Whole Thing Data

David Corrigan recently blogged about the importance of the metadata that tracks the lineage of information presented to an end user, and how the root causes of data quality and data governance issues are impossible to discover without this metadata.

Therefore, the lines of demarcation separating metadata, data, and information are not just an esoteric technical debate. These demarcations are foundational to the efficiency and effectiveness of business operations. So, let’s not call the whole thing data.

Let’s acknowledge the separate, but deeply interrelated, continuum formed by the disciplines of metadata, data, and information.

This originally appeared at OCDQ Blog.

Advertisement

Comments (4)
Well done Jim, as usual you have great insight into these topics and your perspective includes the various components that often get mis-understood.

In my opinion, the who purpose behind data must be the use of the data or information. To define it with only a technical or only a business context isn't working.

Love this article.

Posted by Lisa Marie M | Monday, November 14 2011 at 7:25PM ET
Jim, I've been working on a definitive solution for the data / information / metadata / attributes / properties knot for a while now and I think I have it figured out. I read your blog entitled "The Semantics of MDM" and we share the same philosophy even while we differ a bit on the details. Here goes. It's all information. Good, bad, reliable or not, the argument whether data is information or vice versa is not helpful. The reason data seems different than information is because it has too much ambiguity when it is out of context. Data is like a quantum wave: it has many possibilities one of which is 'collapsed' into reality when you add context. Metadata is not a type of data, any more than attributes, properties or associations are a type of information. These are simply conventions to indicate the role that information is playing in a given circumstance. Your Michelle Davies example is a good illustration: Without context, that string could be any number of individuals, so I consider it data. Give it a unique identifier and classify it as a digital representation in the class of Person,however and we have information. If I then have Michelle add attributes to her personal record - like sex, age, etc - and assuming that these are likewise identified and classed - now Michelle is part of a set, or relation. Note that it is bad practice - and consequently the cause of many information management headaches - to use data instead of information. Ambiguity kills. Now, if I were to use Michelle's name in a Subject Matter Expert field as proof of the validity of a digital asset; or in the Author field as an attribute, her information does not *become* metadata or an attribute: it is still information. It is merely being used differently. In other words, in my world while the terms 'data' and 'information' are classified as concepts, the terms 'metadata', 'attribute' and 'property' are classified as roles to which instances of those concepts (well, one of them anyway) can be put, i.e. they are fit for purpose. This is longer than I intended, but this separation of the identity and class of the string from the purpose to which it is being assigned has produced very solid results for me.
Posted by John O | Tuesday, November 15 2011 at 9:25AM ET
Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.

Blog Archive for Jim Harris

Pondering a Big Data Philosophy
Galileo, the Hubble and Clear Data Insight
When Poor Data Quality Lands on the Ledger
Poor Data Quality That Kills
Data Quality and the OK Plateau

More from Jim Harris »

Blog Index »

Where do young IT professionals (30 and under) obtain information to aid with daily role responsibilities and career development?

Trade publication websites 14%
Social media 23%
Vendor websites 4%
Vendor/community forums 7%
Newsletters 1%
Trade conferences/meetups 2%
RSS feeds 6%
Web search 44%

 

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.