In my April column, I introduced a model of concentric rings of value to prioritize meta data development. In May, I provided more complete definitions of the contents of the rings. In this column, I will discuss the constraints we face with today's development tool environment as well as some ideas concerning prioritizing meta data development around the tools you have in house.

As a quick summary: In the data warehouse tradition, meta data has been classified as either business meta data or technical data. For data warehousing purposes, meta data has tended to be limited to data definitions, data sourcing and business rules for data transformations. However, historically meta data included all the technical data needed to define application systems, especially OLTP applications. That broader scope of meta data included business rules and requirements, data definitions and relationships, and application components and relationships. The overhead for developing and maintaining this complete view of meta data using a passive data dictionary was approximately 15 percent of the effort to design and build an application. This cost exceeded the value of the meta data as proven by the sporadic existence of meta data supporting corporate systems today.

I introduced the notion of concentric rings, with master data at the core of the concentric rings, representing the data with the most extensive reuse. The importance of master data best justifies its development effort.

How do you decide how far from the center you should go in the creation of meta data?

The extremes are always relatively easy to define. If you have no tool support for meta data, you should focus on the creation of business meta data describing only the key master data. That is likely to consist of rich data definitions that describe the master data in context, including when it was created and when it should be deleted. It also contains definitions of sources and any transformations that occur between origination at the source and delivery to a business user.

At the other extreme, if you are in a highly automated development environment using tools that automatically create meta data as part of the process, you can strive to develop complete meta data about your applications and data.

Figure 1: Concentric Model of Enterprise Meta Data

Most of us are somewhere in between. We have some tools that create meta data, but we also have applications running on a variety of platforms with different, non-integrated tools for each platform. What should our approach be in this environment? Let me suggest the following strategy:

  1. Deliver rich business meta data for all data warehouse master data.
  2. Leverage technical meta data captured by installed tools.
  3. Adopt a tactical technology solution that recognizes the existence of many repositories of meta data on many platforms.

Business Meta Data

Regardless of your existing tool capability, any business user who is expected to use ad hoc or flexible reporting tools needs a good understanding of the available data. The data needs to be integrated across all available tools. It should reflect the ways users need to access business meta data and should be readily accessible. For example, users typically need search capability to support a query (e.g., What are all the data elements in the data warehouse that relate to ROI?) so they can decide which data elements to include in an analysis or report. The problem is that many query tools with built-in meta data assume the user understands the data and how it is organized. Those tools do not provide search capabilities against the meta data.

Different companies solve the business meta data problem in different ways. Some publish printed handbooks with a good index. This is a low- tech but potentially effective approach. Some use a collaborative tool such as Lotus Notes. Some use browser technology over an intranet.

Whatever your chosen medium, it is most important to deliver this class of meta data.

Leverage Technical Meta Data

Software development tools differ widely in their ability to generate and share meta data. Fortunately, the data warehouse community has finally consolidated on a single meta data exchange standard, the Common Warehouse Metamodel. Many vendors such as IBM, Informatica, Microsoft, NCR, Oracle and others are jumping on that bandwagon. Unfortunately, this standard is tightly linked to data warehousing and does not address the OLTP space.

The message here is to leverage what you have and not to attempt to build solutions for deficiencies, unless they are small and can be addressed with minor investment. To the extent that your tools automatically collect meta data and have the ability to share it, you should establish standards for ensuring the data is collected optimally and shared with other tools.

Adopt a Tactical Technology Solution

The notion of sharing meta data between tools implies that meta data will be redundant. Shared meta data, by definition, is created in one tool and copied to another tool that needs the same meta data. This creates the long-standing problem with redundant data: how to ensure that all the copies are consistent.

A tactical solution to this problem is to define a master metamodel for meta data. Then, identify all the tools and platforms that hold meta data and map their contents to the metamodel. This should highlight the redundancies. In some cases, the type of data will be redundant, but the contents will be partitioned. For example, a relational database catalog holds database meta data, but only for those databases managed by that DBMS. In this scenario, you can treat the meta data as partitioned across all the catalogs collectively, and there should not be any consistency problems. (This is not to say that you won't have consistency problems in your database designs, but that is a different topic.)

However, data definitions shared between an ETL tool and a couple of query tools put the same data definition in multiple places. If the definition needs to change, how can you ensure consistency in the definitions across all locations? One location must be designated as the master, and all copies need to be synchronized to the master copy. Otherwise, there is no way to ensure the integrity and consistency of the meta data.

A technique I have found useful is to create a chart showing the various platforms in a development organization and the meta data entities stored on each platform. This makes it easier to determine what meta data exists and which copies should be designated as the masters. A typical chart looks something like Figure 2.

Figure 2: A Typical Chart to Ensure Integrity and Consistency of Meta Data

This is a simplified example in which conceptual data attributes and physical data elements are treated alike; you may prefer to use a more detailed metamodel. Regardless, this chart makes it clear that attributes/elements are defined in many places. Your analysis needs to rationalize and differentiate between the copies. In this example, some items of the metamodel, such as definitions of business areas, are missing. You need to evaluate whether there is an efficient way to gather and maintain those missing items. Today it is likely that you will determine that it is too difficult to gather those missing items and impossible to maintain them with integrity. In that case, the prudent choice is to accept their absence and move on to more fruitful problem solving.

This chart does not differentiate between the levels of meta data shown in the concentric rings. In your analysis, it would be useful to create a separate chart for each ring. This will help you think about each ring of meta data as a separate category.

In March, I discussed the components of a data architecture. You may have noticed that meta data was not one of the components mentioned. Next month I will address the relationship between meta data and the total data architecture picture.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access