Data governance is quite a hot topic these days, with more people becoming sensitized to the importance of establishing policies regarding the meaning and management of business information and the way that it is processed, updated and transformed. We think of a metadata management program as the articulation of those policies and the measurement of how effectively those policies are applied, as well as the description of the state of the enterprise’s data assets. Doing so leads us to the conclusion that implementing a program to manage metadata is foundational to the process of data governance.

When we talk about data governance and metadata, we are fond of talking about getting to the single source of the truth. In the undertaking of a data governance program, how shall we start looking for that truth? We could start by establishing a data governance committee or task force and asking this group to establish policies, define business entities and create rules about relationships between them. These are important tasks, but once we complete them, we have to apply the results to our systems. This approach could begin without any basis of information that describes the “as-built” or “as-is” environment, so it could only proceed in a relatively abstract fashion. Management is not typically so fond of abstraction; reference to existing conditions and how implementation of a program is going to impact existing conditions and produce bottom line results is usually required.

This takes us to a different starting point, one in which management first approves an initiative to capture and organize metadata associated with existing business processes, then the governance team can use the information yielded by this initiative to inform the governance process. This can be a daunting challenge, especially in times of tight budgets and the need for any activity to show fast ROI.

At the beginning of a metadata capture effort, one must make a build-or-buy decision with respect to the approach to be used to gather and present metadata that describes enterprise information systems. Commercial-off-the-shelf packages have the advantage of bringing a predefined repository model to the table, and the vendors of those products have made substantial investments in bringing metadata from various enterprise systems into their repositories. Some of them have put an effort into designing and implementing a meaningful Web-based user interface and usable application programming interfaces that enable users to access metadata in context.

Primary Focus Areas

Juxtaposed to implementing metadata management around COTS products lies the possibility of “rolling your own.” Taking such an approach will be most effective if you can proceed incrementally, starting with areas that look like they will provide quick ROI. In considering this strategy, keep in mind that managing metadata in your organization nets out to three main areas of work: designing the metadata repository, populating the repository and using the information in the repository.

These three areas really leverage different disciplines within your IT organization. If you are going to build a team to implement a metadata management program, you will want to include team members who will focus on each of these areas and equip them with appropriate tools. Fortunately, you probably already have the skill sets and tools in house that your organization can leverage to do this work. If you don’t, there are skilled people in the marketplace, and low-cost or free tools available.

To undertake the first task, you will need experienced data modelers designing databases that can be used to represent object graphs. Common object and data modeling tools can assist in this effort. From a data model standpoint, searching the Web for terms like “metadata model repository” will yield some interesting references. Additionally, there have been some books published on the subject.

Another approach is to leverage a metadata repository that comes with a toolset already in use. For instance, ETL vendors offer metadata management applications that serve to catalog and manage ETL metadata, and in some cases they also provide the tools to catalog the metadata associated with source and target systems. If not, the repositories that underlie these tools can be extended to serve broader metadata management applications.

Presentation Strategies

Access to metadata is most effective when it is provided in the course of the tasks where metadata provides valuable context. Depending on the objectives of your metadata program, there can be multiple places to make metadata available. For instance, it is valuable for users to be able to reference metadata when:

  • Creating ETL processes that move data between operational systems;
  • Creating dimensional models for analytic reporting;
  • Designing reports;
  • Reviewing reports and other analytic artifacts;
  • Organize documents and other information assets, provide taxonomy-based search capabilities;
  • Planning changes in how systems interoperate;
  • Replacing systems;
  • Implementing a master data management initiative;
  • Analyzing data quality and developing improvement strategies; and
  • Capacity planning (considering statistics like host characteristics and throughput of processing systems as metadata about those systems).

You need to prioritize the ways in which you expect to obtain value from the metadata program and build interfaces that expose the metadata in the context of those tasks. For example, you want to make it very easy for people who spend a lot of time creating reports to see how the data elements they are using were derived, where they came from, and what their latency is. Ideally, that information would be embedded in (or one click away from) the tool that the analyst is using to create the report. When creating reports, it is useful to define an API that allows linking of headings and labels to metadata that describe those headings and labels and how the values they reference were assembled.
For general purpose access to metadata, we are seeing a lot of interest in and activity around using a wiki-based approach to navigating and annotating the repository. Insofar as the metadata repository can be exposed essentially as a content management system, and wikis are becoming common interfaces to content, this makes a lot of sense. By using a wiki, you can provide a place for users to comment on, add business context, and discuss the information that is in the metadata repository.

Populating the Repository

The process for populating the metadata repository is a challenge with multiple dimensions. First, the classes of information sources can be very broad, depending on the scope of your effort. Of course, scoping the effort more narrowly (for example, starting with capturing the metadata around populating the data warehouse) can diminish this effort somewhat. Second, the location of the information sources can be geographically diverse. Third, simply identifying the systems that you need to harvest metadata from can be challenging, i.e., since metadata tells you what information assets you have, how do you know what information assets you have before you collect it? Fourth, metadata is not static, nor is it likely to notify you when it has changed.

Considering these dimensions, most medium to large-sized organizations have resources, and some have toolsets that they can use to gather and transform data from diverse data sources throughout the enterprise. The resources are ETL developers, and the toolsets are the ETL products that they are using; if they are not currently using ETL tools for data collection tasks.

ETL products were built to address the various dimensions of the metadata repository population problem. They come equipped with connections to myriad systems and typically are already programmed to interrogate whatever internal metadata those systems contain. They tend to support a wide range of communications and connectivity protocols so that they can track down information wherever it happens to be located. While they can’t a priori know where all of the metadata might lie, the work that both ETL product developers and enterprise ETL implementers have already done to collect data from those systems provides the signposts that can get this effort started. Finally, these products can be automated to periodically visit the systems they have cataloged and look for new or changed information.

Leveraging ETL products for metadata capture and management will certainly improve the productivity of ETL efforts and can provide data lineage and visualization capabilities. Recent analyst research indicates that corporate ETL developers increasingly understand the benefits of implementing metadata solutions based on their ETL technology of choice. In a recent independent research report, 56 IT professionals with a metadata strategy were polled and 68 percent of respondents said they are considering using ETL technologies as a part of their metadata strategy.

Hopefully, this article will trigger some thought around the possibility of chartering existing ETL developers to develop metadata capture and management programs, using the familiar ETL tools at your disposal. While there are a number of high quality products in the marketplace focused specifically on providing metadata solutions, acquiring them means making a case for budgeting for another class of software license, the annual maintenance renewal, and the cost of allocating personnel resources to learn and use such a product.

As we noted at the outset, factors like the data explosion and increasing regulatory oversight are making data governance a high-profile practice in today’s enterprise. The data governance effort is going to have to look at exposing redundant and out of date data and deprecating it, ensuring that validated data is available to those who need it, coordinating data collection and aggregation processes, and systematizing efforts to improve data quality. These are the areas that the ETL platform suites have been built to address, making them indispensable to data governance efforts.

Given that IBM and SAP have toolsets already built to handle enterprise metadata efforts, a fact that does not escape notice in the annual ETL tool roundups that leading analysts publish, it is probably just a matter of time before Oracle and Microsoft follow suit. The addition of metadata management capabilities to integrated database, integration, reporting and analytics products will increase marketplace awareness of efforts in this space and put pressure on vendors that specialize in metadata solutions. In this time of transition, consider getting ahead of the curve by leveraging existing licenses or low-cost toolsets as a starting point for a metadata management program.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access