Continue in 2 seconds

Managing Distributed Data Warehouse Meta Data

Published
  • February 01 1999, 1:00am EST

Although organizations are now successfully deploying data warehousing and decision processing products for providing business users with integrated, accurate and consistent business information, most companies have failed to provide a similar level of integration of the meta data associated with this business information. This result is caused not only by a lack of understanding of the importance of meta data, but also because meta data integration is a complex task. The trend toward the use of packaged analytic applications in data warehousing will make meta data integration even more difficult. Data warehouse customers urgently need to address this problem if the full benefits of data warehousing are to be achieved. In this article we explain why meta data management and standardization is important to the success of a data warehouse and explore industry efforts in this area.

The Benefits of Integrated Meta Data

To date in data warehousing most organizations have avoided the issue of meta data management and integration. Many companies, however, are now beginning to realize the importance of meta data in decision processing and to understand that the meta data integration problem cannot be ignored. There are two reasons for this.

  1. The use of data warehousing and decision processing often involves a wide range of different products, and creating and maintaining the meta data for these products is time- consuming and error prone. The same piece of meta data (a relational table definition, for example) may have to be defined to several products. This is not only cumbersome, but also makes the job of keeping this meta data consistent and up to date difficult. Also, it is often important in a data warehousing system to track how this meta data changes over time. Automating the meta data management process and enabling the sharing of this so-called technical meta data between products can reduce both costs and errors.
  2. Business users need to have a good understanding of what information exists in a data warehouse. They need to understand what the information means from a business viewpoint, how it was derived, from what source systems it comes, when it was created, what pre-built reports and analyses exist for manipulating the information, and so forth. They also may want to subscribe to reports and analyses and have them run, and the results delivered to them, on a regular basis. Easy access to this business meta data enables business users to exploit the value of the information in a data warehouse. Certain types of business meta data can also aid the technical staff ­ examples include the use of a common business model for discussing information requirements with business users and access to existing business intelligence tool business views for analyzing the impact of warehouse design changes.

Improved Productivity

The benefit of managing data warehouse technical meta data is similar to those obtained by managing meta data in a transaction processing environment ­ improved developer productivity. Integrated and consistent technical meta data creates a more efficient development environment for the technical staff who are responsible for building and maintaining decision processing systems. One additional benefit in the data warehousing environment is the ability to track how meta data changes over time. The benefits obtained by managing business meta data, on the other hand, are unique to a decision processing environment and are key to exploiting the value of a data warehouse once it has been put into production.

Figure 1 shows the flow of meta data through a decision processing system as it moves from source systems, through extract and transformation (ETL) tools to the data warehouse and is used by business intelligence (BI) tools and analytic applications. This flow can be thought of as a meta data value chain. The further along the chain you go, the more business value there is in the meta data. However, business value depends on the integrity of the meta data in the value chain. As meta data is distributed across multiple sources in the value chain, integrity can only be maintained if this distributed meta data is based on a common set of source meta data that is current, complete and accurate. This common set of source meta data is often called the meta data system of record. Another important aspect of the value chain is that business users need to be able to follow the chain backward from the results of decision processing to the initial source of the data on which the results are based.

There are two items in Figure 1 that have not been discussed so far. The decision processing operations box in the diagram represents the meta data used in managing the operation of a decision processing system. This includes meta data for tracking extract jobs, business user access to the system, and so forth. The common business model box represents the business information requirements of the organization. This model provides a high-level view, by business subject area, of the information in the warehousing system. It is used to provide common understanding and naming of information across warehouse projects.1

Meta Data Sharing and Interchange

Past attempts by vendors at providing tools for the sharing and interchange of meta data have involved placing the meta data in a central meta data store or repository, providing import/export utilities and programmatic APIs to this store and creating a common set of meta-models for describing the meta data in the store. In the transaction processing environment, this centralized approach has had mixed success, and there have been many failures. In the decision processing marketplace, vendors are employing a variety of centralized and distributed approaches for meta data management. The techniques used fall into one of three categories: 1) meta data repositories for meta data sharing and interchange, 2) meta data interchange "standards" defined by vendor coalitions, 3) vendor specific "open" product APIs for meta data interchange. These three approaches, together with product examples, are discussed in more detail in the sidebar accompanying this article.

Given that multiple decision processing meta data approaches and "standards" are likely to prevail, we will, for the foreseeable future, be faced with managing multiple meta data stores, even if those stores are likely to become more open. The industry trend toward building distributed environments involving so-called federated data warehouses, consisting of an enterprise warehouse and/or multiple data marts, will also encourage the creation of multiple meta data stores. The only real solution to meta data management is to provide a facility for managing the flow of meta data between different meta data stores and decision processing products. This capability is provided using a meta data hub for the technical staff and a business information directory for business users (see Figure 2).

Why Two Tools for Meta Data Management?

There are continuing debates in the industry about whether technical and business meta data should be maintained in the same meta data store or kept separate. This debate looks at meta data from the wrong dimension. Meta data should be viewed from the perspective of the person using it ­ not simply from the type of meta data it is. In general, technical staff employ technical meta data during warehouse development and maintenance. They also need access to certain types of business meta data. Examples include the common business model when discussing information requirements with business users and BI tool business views when analyzing the impact of warehouse design changes. Business users, on the other hand, employ business meta data as an aid to finding the business information they require in a warehouse; but they also use high-level technical meta data when trying, for example, to relate decision processing results to the source data used to create the results.

There is a strong argument in favor in of the deployment of the two types of meta data management tools: a GUI- or Web-based tool (a meta data hub) for technical staff when developing and maintaining the data warehouse and a Web- based tool (a business information directory) for business users when employing decision processing tools and applications. The reason for this separation is that the usage, architecture and interfaces for the two types of meta data user are completely different. There does, however, need to be a link between the two types of tool. Users of a business information directory need to be able to drill through from a business information directory to the technical meta data maintained by a meta data hub. To facilitate this drill- through facility, the two types of meta data management tools might employ a common meta data store.

The approaches and products (outlined in the sidebar accompanying this article) are targeted at technical users and at providing support for one or more requirements of a meta data hub. For this reason, in the remainder of this article, we will focus on the architecture and requirements of a meta data hub. It is important to point out, however, that vendors are also working on meta data management tools for business users. These tools support the concept of a business information directory and are being integrated into a Web-based interface known as an information portal.2

The Meta Data Hub

The meta data hub is used for managing the interchange and sharing of technical meta data between decision processing products. It is intended for use primarily by technical staff during the development and maintenance of data warehouses. The four main requirements of such a hub are outlined below.

  1. A meta data hub should support the interchange of meta data between systems and products in a distributed meta data environment. The hub should have a documented and open programmatic object interface (employing COM or CORBA, for example) that enables third-party tools to use the services of the hub. A file transfer mechanism supporting industry recognized file formats (comma delimited file, Meta Data Coalition MDIS, Microsoft XML Interchange Format, for example) should also be provided for meta data interchange.
  2. A meta data hub should provide persistent stores for the management and sharing of meta data. Meta data in a store should be maintainable by the object API and file transfer methods outlined above and via supplied GUI and Web client interactive interfaces. An interactive and batch meta data impact analysis and reporting feature is also required. The hub should offer an agent interface that can scan and capture, at user-defined intervals, local products and systems for new or modified meta data for adding to the meta data store. The meta data manager used to maintain meta data in the store should support version and library control features that can create a historical record of meta data changes and support group development. In large distributed environments, the administrator should be able to physically partition the meta data environment across multiple hub servers and meta data stores.
  3. The meta data hub should, at a minimum, be able to manage data warehouse information store definitions. Formats supported should include relational tables and columns, and multidimensional measures and dimensions. Another type of meta data that could be handled is information about the data sources used to create data warehouse information and about the transforms applied to this source data before it is loaded in a warehouse. It is recognized, however, that current ETL tools use their own proprietary transformation methods, making it difficult to create a generalized facility for managing this type of meta data. The product should at least provide the ability to document data source and transformation meta data in free-form text format. Ideally, the hub should also document details about the business meta data associated with the common business model discussed earlier and the business views employed by business intelligence tools and analytic applications to access warehouse information.
  4. The hub should use industry-standard meta data models or supply its own meta-models for the various types of meta data it manages. These meta-models should be fully documented and extensible.

A Complex Task

Managing distributed decision processing and data warehouse meta data is a complex task. The effort being put into meta data sharing and integration by leading vendors and their partners, however, demonstrates the significant business benefits that can be gained from managing meta data in the decision processing environment. For this reason alone, these vendor de facto "standardization" efforts stand a better chance of success than earlier attempts at solving meta data integration issues. The challenge for IT organizations will be to integrate the various tools that use these different meta data standards.

References

  1. See the DataBase Associates paper, "Decision Processing: A Blueprint for the Intelligent Enterprise" for more details.
  2. More information about business information directories and information portals can be found the DataBase Associates paper, "Finding Business Information in the Enterprise: The Information Portal."

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access