Since my previous post about metadata received excellent commentary, I decided to write a follow-up post to address one of the many great points this discussion and its participants raised, namely the role of controlled vocabularies or metadata dictionaries.

According to an insightful comment from John O’Gorman, “the nature of the medium in which we are trying to solve these problems is multi-dimensional. Any organization can have — and should manage — multiple dialects.”

“By that I mean,” O’Gorman continued, “in the dialect of accounting, customer means some agent who has contributed to increased sales. In the dialect of marketing, customer can mean anyone with a pulse that will sit and listen to a pitch. This insistence on a single version of anything, which is embedded in controlled vocabularies, relational tables, object classes, or a folder structure, is the single largest impediment to cleaning up the digital wasteland.”

One example of this digital wasteland metadata challenge, taken from the crowd-sourced wisdom of social media, is a hashtag, which Twitter users include in their tweets in order to tag them for search engines and trending topics websites.

Since it’s also a common strategy for making any type of unstructured data more usable, tagging is a great example of one of the semantic challenges of metadata. Users freely choosing tags often creates a so-called folksonomy, as opposed to users being forced to only select terms from a controlled vocabulary. Which is precisely why the metadata resulting from tagging can include homonyms (i.e., the same tags used with different meanings) and synonyms (i.e., multiple tags for the same concept), which may lead to inappropriate data relationships and inefficient searches for data about a particular subject.

The Metadata of Babel

Another insightful comment came from Peter Benson, based on his work with the eOTD (ECCMA Open Technical Dictionary).

“Mention the word metadata,” Benson explained, “and you have immediately lost all but the hard core techies and they have neither the authority nor the budget to solve the problem. If you take a hard look at the financial crisis or cancer research you will indeed find the reason the challenges are so difficult to solve is in large part because of the limitations in our ability to communicate effectively and the lack of transparency that comes from poor data integration. So, metadata is really important.”

“The Babel approach of a single language to unite them all,” Benson continued, “has a very poor track history and there is good reason for this.  Language is more about power and authority than it is about true communication. We have tried to come up with a solution that is solely focused on achieving unambiguous communication. It really does not matter what it is called as long as we agree on what it is. We do this by using terminology to define concepts and then assigning concept identifiers that are used as metadata. The separation of the terminology from the concept identifier, or rather linking terminology through a concept identifier, allows everyone to remain comfortably in their own space yet communicate with others.”

The Metadata Continuum

So it would appear that we face a daunting challenge, which we could call the Metadata Continuum, where at one end we have the uniformity of controlled vocabularies, and at the other end we have the flexibility of chaotic folksonomies. The daily business operations of most organizations are governed by a metadata strategy that falls somewhere in between, which begs the question: In which direction should the best practices of metadata management flow — toward flexibility or toward uniformity?

Since in my previous post I used an example of the metadata complexities of everyday language, I thought it might be useful to share two perspectives about linguistic flexibility and uniformity.

In his book “Final Jeopardy: Man vs. Machine and the Quest to Know Everything,” Stephen Baker explained that “flexibility isn’t a weakness of language, but a strength.  Humans need words to be inexact. If they were too precise, each person would have a unique vocabulary of several billion words, all of them unintelligible to everyone else. You might have a unique word for the sip of coffee you just took at 7:59 a.m., which was flavored with the anxiety about the traffic in the Lincoln Tunnel or along Paris’s Boulevard Périphérique. But that single word would be as useless to you as to everyone else. A word has to be used at least twice to have any purpose. Each word is a lingua franca, a fragment of a clumsy common language.”

“Yet paradoxically,” explained Kevin Kelly, in his book “What Technology Wants,” “diversity can be unleashed by a type of uniformity. The uniformity of a standard writing system (like an alphabet or script) unleashes the unexpected diversity of literature. Without uniform rules, every word has to be made up, so communication is localized, inefficient, and thwarted.”

“But with a uniform language,” Kelly continued, “sufficient communication transpires in large circles so that a novel word, phrase, or idea can be appreciated, caught, and disseminated. The rigidity of an alphabet has done more to enable creativity than any unhinged brain-storming exercise ever invented. The standard 26 letters in English have produced 16 million different books in English. Words and language will keep evolving, but their evolution rides on basic fundamentals that are conserved and shared; unvarying (over the short term) letters, spelling, and grammar rules enable creativity in ideas. In a curious way, the homogenization of shared universals allows the transmission of diversity.”

Perhaps since both flexibility and uniformity have linguistic value, metadata will forever remain a continuum between the two.

Where along the Metadata Continuum is your organization?

This blog originally appeared at OCDQ blog.