NOV 30, 2010 3:46am ET

Related Links

Oracle to Buy Social SaaS Provider Vitrue
May 24, 2012
Obama: Better Federal Data Quality, Availability within Year
May 23, 2012
Bloomberg Launches Data Management Service with PolarLake Buy
May 23, 2012

Web Seminars

The Big Deal About Big Data Governance
Available On Demand
Getting Started with Big Data
Available On Demand
Transactions & Interaction: The Correlation of Structured and Unstructured Data
Available On Demand

Why Organize When You Can Search?

Print
Reprints
Email

I have a confession to make. Over the years, I’ve struggled with my inbox.

As the amount of emails grows, so does its unwieldiness. I’ve experienced the dirty-closet syndrome where I’ve mightily attempted to keep it organized. Due to the fact that I had “real work” to do rather than file the 100+ daily emails, I would tend to lag on my email filing chores. Over time, email was spread between topical folders, folders from specific individuals or distribution lists, the “to do” folder, the “I’m working on it” folder and, of course, the vast abyss of the inbox itself, making it nearly impossible to retrieve an old email.

The problem was that I couldn’t remember where I put the darn email. Did I put it in my industry reports directory, standards and policies directory or some other directory, or all three? If it was in all three, which one was the latest thread? Retrieval of old emails was just as difficult when things were organized as when they were not.

Thankfully, desktop search came along and solved these problems for me. As long as I remember a little bit about the email, such as keywords, the date or who it came from, retrieving it is not too problematic. Now I archive emails by quarter in one folder and leave it to my search skills. The time it takes to search and retrieve is almost always less than the time it takes to navigate the dirty closet or the magnificent folder structure. The better the email is structured, the easier it is to find it, and I’ve learned to do that with the subject and the body to optimize retrieval.

So how does this apply to metadata? I’ve many organizations deal with both the over-organized data dictionary and the dirty-closet data dictionary. In both situations, distributing out data definitions to desperate users is a struggle. A searchable and accessible metadata dictionary is a utopia every organization strives for, but will using an off-the-shelf search technology solve all their problems?

It will help the business users who may not be familiar with a data model or metadata repository but who understand that a particular data element is critical to the report they’re building for upper management. And it would help the data architects, because why take the time to organize the metadata when you can search it? However, duplication of metadata sources presents a tough challenge. Poor structure and quality of the metadata will make understanding the results difficult at best.

While the ability to search is critical and will solve many of the issues with both scenarios, understanding the results is critical. Trusting the results is even more critical. Some heavy lifting up front to structure and tag the metadata will greatly optimize the search, retrieval and interrogation of the results for end users.

How Accessible is your Metadata?

You’ve probably heard a million times that data is growing at astronomical rates. The proliferation of data has also lead to a proliferation with metadata. It is not uncommon for a medium-to-large organization to manage thousands of tables and tens or hundreds of thousands of columns across various database systems. How much redundancy in the data definitions is spread across those tables? What is standardized and what is not? Many enterprise search technologies only focus on unstructured metadata (such as email, documents, desktops, content management systems and the Web) or are only configured for their own technology stack out-of-the-box. Yet, it is the structured data definitions in metadata repositories and/or modeling tool repositories that organizations struggle the most to publish out to business users and would benefit most from making searchable.

Where the metadata lives presents its own problems. If the models are not consolidated in a repository, searching is impossible. If they are, you are left to the mercy of the modeling tool to provide a search that is easily accessible - either via a browser or application programming interface - and result sets that are presented clearly with the ability to filter. Metadata repositories, whether homegrown or purchased, need to be updated on a regular basis so that users can trust the results. Their search capabilities should also be carefully evaluated.

Ensuring the Clarity of the Metadata

In order for users to trust the results they are seeing, the sources need to be trusted. Confidence in the metadata - such as object instances, definitions and usage - in the search results often requires adding more metadata to enhance the clarity. It is not going to be perfect in the beginning, but it will absolutely get better over time. This can mean a number of things, but the most critical would be:

  • Distinguishing between common data elements,
  • Conveying the status of definitions on data elements,
  • Applying governance to data definitions, and
  • Communicating stewardship of the metadata.

Distinguishing between an authoritative source of a data definition versus an instance in another model or application should be something that is clear to users when searching for metadata. There are a number of reasons why a common data element may be different across applications. This can be intended or not depending on usage of the data and the age of an application. What should be crystal clear to the end users is why they are different so they can choose the right instance for their need.

Conveying the status of definitions is important for obvious reasons, but mainly it will help end users understand exactly what is being presented to them. If something is a work in progress, it should be clear to the end user. Conversely, if something has been approved, that is also critical information for the end user. The key point is to gain trust. If an inaccuracy is found with a definition, but it is clearly a work in progress, the person will likely come back at a later date. If it is not clear that it is a work in progress, he or she may go somewhere else for the needed information.

Advertisement

Comments (0)

Be the first to comment on this post using the section below.

Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.
Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.