When we talk about data governance and metadata, we are fond of talking about getting to the single source of the truth. In the undertaking of a data governance program, how shall we start looking for that truth? We could start by establishing a data governance committee or task force and asking this group to establish policies, define business entities and create rules about relationships between them. These are important tasks, but once we complete them, we have to apply the results to our systems. This approach could begin without any basis of information that describes the “as-built” or “as-is” environment, so it could only proceed in a relatively abstract fashion. Management is not typically so fond of abstraction; reference to existing conditions and how implementation of a program is going to impact existing conditions and produce bottom line results is usually required.
This takes us to a different starting point, one in which management first approves an initiative to capture and organize metadata associated with existing business processes, then the governance team can use the information yielded by this initiative to inform the governance process. This can be a daunting challenge, especially in times of tight budgets and the need for any activity to show fast ROI.
At the beginning of a metadata capture effort, one must make a build-or-buy decision with respect to the approach to be used to gather and present metadata that describes enterprise information systems. Commercial-off-the-shelf packages have the advantage of bringing a predefined repository model to the table, and the vendors of those products have made substantial investments in bringing metadata from various enterprise systems into their repositories. Some of them have put an effort into designing and implementing a meaningful Web-based user interface and usable application programming interfaces that enable users to access metadata in context.
Primary Focus Areas
Juxtaposed to implementing metadata management around COTS products lies the possibility of “rolling your own.” Taking such an approach will be most effective if you can proceed incrementally, starting with areas that look like they will provide quick ROI. In considering this strategy, keep in mind that managing metadata in your organization nets out to three main areas of work: designing the metadata repository, populating the repository and using the information in the repository.
These three areas really leverage different disciplines within your IT organization. If you are going to build a team to implement a metadata management program, you will want to include team members who will focus on each of these areas and equip them with appropriate tools. Fortunately, you probably already have the skill sets and tools in house that your organization can leverage to do this work. If you don’t, there are skilled people in the marketplace, and low-cost or free tools available.
To undertake the first task, you will need experienced data modelers designing databases that can be used to represent object graphs. Common object and data modeling tools can assist in this effort. From a data model standpoint, searching the Web for terms like “metadata model repository” will yield some interesting references. Additionally, there have been some books published on the subject.
Another approach is to leverage a metadata repository that comes with a toolset already in use. For instance, ETL vendors offer metadata management applications that serve to catalog and manage ETL metadata, and in some cases they also provide the tools to catalog the metadata associated with source and target systems. If not, the repositories that underlie these tools can be extended to serve broader metadata management applications.
Presentation Strategies
Access to metadata is most effective when it is provided in the course of the tasks where metadata provides valuable context. Depending on the objectives of your metadata program, there can be multiple places to make metadata available. For instance, it is valuable for users to be able to reference metadata when:
- Creating ETL processes that move data between operational systems;
- Creating dimensional models for analytic reporting;
- Designing reports;
- Reviewing reports and other analytic artifacts;
- Organize documents and other information assets, provide taxonomy-based search capabilities;
- Planning changes in how systems interoperate;
- Replacing systems;
- Implementing a master data management initiative;
- Analyzing data quality and developing improvement strategies; and
- Capacity planning (considering statistics like host characteristics and throughput of processing systems as metadata about those systems).
You need to prioritize the ways in which you expect to obtain value from the metadata program and build interfaces that expose the metadata in the context of those tasks. For example, you want to make it very easy for people who spend a lot of time creating reports to see how the data elements they are using were derived, where they came from, and what their latency is. Ideally, that information would be embedded in (or one click away from) the tool that the analyst is using to create the report. When creating reports, it is useful to define an API that allows linking of headings and labels to metadata that describe those headings and labels and how the values they reference were assembled.
For general purpose access to metadata, we are seeing a lot of interest in and activity around using a wiki-based approach to navigating and annotating the repository. Insofar as the metadata repository can be exposed essentially as a content management system, and wikis are becoming common interfaces to content, this makes a lot of sense. By using a wiki, you can provide a place for users to comment on, add business context, and discuss the information that is in the metadata repository.









Be the first to comment on this post using the section below.