I have a confession to make. Over the years, I’ve struggled with my inbox.

As the amount of emails grows, so does its unwieldiness. I’ve experienced the dirty-closet syndrome where I’ve mightily attempted to keep it organized. Due to the fact that I had “real work” to do rather than file the 100+ daily emails, I would tend to lag on my email filing chores. Over time, email was spread between topical folders, folders from specific individuals or distribution lists, the “to do” folder, the “I’m working on it” folder and, of course, the vast abyss of the inbox itself, making it nearly impossible to retrieve an old email.

The problem was that I couldn’t remember where I put the darn email. Did I put it in my industry reports directory, standards and policies directory or some other directory, or all three? If it was in all three, which one was the latest thread? Retrieval of old emails was just as difficult when things were organized as when they were not.

Thankfully, desktop search came along and solved these problems for me. As long as I remember a little bit about the email, such as keywords, the date or who it came from, retrieving it is not too problematic. Now I archive emails by quarter in one folder and leave it to my search skills. The time it takes to search and retrieve is almost always less than the time it takes to navigate the dirty closet or the magnificent folder structure. The better the email is structured, the easier it is to find it, and I’ve learned to do that with the subject and the body to optimize retrieval.

So how does this apply to metadata? I’ve many organizations deal with both the over-organized data dictionary and the dirty-closet data dictionary. In both situations, distributing out data definitions to desperate users is a struggle. A searchable and accessible metadata dictionary is a utopia every organization strives for, but will using an off-the-shelf search technology solve all their problems?

It will help the business users who may not be familiar with a data model or metadata repository but who understand that a particular data element is critical to the report they’re building for upper management. And it would help the data architects, because why take the time to organize the metadata when you can search it? However, duplication of metadata sources presents a tough challenge. Poor structure and quality of the metadata will make understanding the results difficult at best.

While the ability to search is critical and will solve many of the issues with both scenarios, understanding the results is critical. Trusting the results is even more critical. Some heavy lifting up front to structure and tag the metadata will greatly optimize the search, retrieval and interrogation of the results for end users.

How Accessible is your Metadata?

You’ve probably heard a million times that data is growing at astronomical rates. The proliferation of data has also lead to a proliferation with metadata. It is not uncommon for a medium-to-large organization to manage thousands of tables and tens or hundreds of thousands of columns across various database systems. How much redundancy in the data definitions is spread across those tables? What is standardized and what is not? Many enterprise search technologies only focus on unstructured metadata (such as email, documents, desktops, content management systems and the Web) or are only configured for their own technology stack out-of-the-box. Yet, it is the structured data definitions in metadata repositories and/or modeling tool repositories that organizations struggle the most to publish out to business users and would benefit most from making searchable.

Where the metadata lives presents its own problems. If the models are not consolidated in a repository, searching is impossible. If they are, you are left to the mercy of the modeling tool to provide a search that is easily accessible - either via a browser or application programming interface - and result sets that are presented clearly with the ability to filter. Metadata repositories, whether homegrown or purchased, need to be updated on a regular basis so that users can trust the results. Their search capabilities should also be carefully evaluated.

Ensuring the Clarity of the Metadata

In order for users to trust the results they are seeing, the sources need to be trusted. Confidence in the metadata - such as object instances, definitions and usage - in the search results often requires adding more metadata to enhance the clarity. It is not going to be perfect in the beginning, but it will absolutely get better over time. This can mean a number of things, but the most critical would be:

  • Distinguishing between common data elements,
  • Conveying the status of definitions on data elements,
  • Applying governance to data definitions, and
  • Communicating stewardship of the metadata.

Distinguishing between an authoritative source of a data definition versus an instance in another model or application should be something that is clear to users when searching for metadata. There are a number of reasons why a common data element may be different across applications. This can be intended or not depending on usage of the data and the age of an application. What should be crystal clear to the end users is why they are different so they can choose the right instance for their need.
Conveying the status of definitions is important for obvious reasons, but mainly it will help end users understand exactly what is being presented to them. If something is a work in progress, it should be clear to the end user. Conversely, if something has been approved, that is also critical information for the end user. The key point is to gain trust. If an inaccuracy is found with a definition, but it is clearly a work in progress, the person will likely come back at a later date. If it is not clear that it is a work in progress, he or she may go somewhere else for the needed information.

Applying governance to the metadata will improve search and discovery over time. The most important aspect to consider is some level of quality measure on the metadata. This will allow users to interpret results more accurately and have confidence in the information presented. It will also help them understand what they can depend on and what they need to take with a grain of salt. As more data elements move up the quality scale, the results can be used to show progress and gain more sponsorship.

Communicating stewardship is another key to help improve end users with the search and discovery of the metadata. The bottom line is that end users should have an avenue to direct feedback on the metadata to the people managing it. If they are having trouble finding specific metadata or with the quality of it, then communicating feedback back to the management team is imperative to correcting the problem. Knowing who is accountable as users are performing their search and discovery will also promote trust.

Searching for information on specific metadata may seem like trying to find a needle in a haystack. Ensuring that your metadata is accessible, cleansed and consolidated will make finding that needle much easier. Understanding end-user needs will allow you to improve the structure and semantics of your metadata so that it is accessible and digestible by the people who need it. Improving the quality of the metadata over time will guarantee that users can trust what they are searching for and what is presented to them.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access