In recent months, the topic of meta data integration in the data warehousing environment has consumed a significant and increasing amount of time and space at IT conferences and in trade journals and publications. Most notable is the normal reaction from any person involved in a discussion that suddenly revolves around meta data ­ a protracted sigh and one of the following comments:

  • It's too hard.
  • It's too complicated.
  • It just doesn't work.

One thing that becomes apparent very quickly is we are continually looking for a solution to a problem that doesn't seem to have one. More specifically, we are in search of a way to make sense of meta data in an intuitive, easy and straightforward manner. Sound anomalous? Impossible? This article will examine a different approach to the integration of business meta data for the data warehouse which meets all those criteria: it can be intuitive, easy and straightforward ­ but, we must first change the way we think about business meta data.

Meta Data Definition Transformation/
Business (mostly unstructured)
  • What does it mean?
  • Where can I find it?
  • How was it calculated?
  • What were the sources?
  • What business rules were applied?
  • What training is available?
  • Who's on the steering team?
  • What's the easiest way to get in?
  • How fresh is the information?
Technical (mostly structured)
  • Format
  • Length
  • Domain
  • Database
  • Catalog
  • Filters
  • Aggregates
  • Calculations
  • Expressions
  • Capacity Planning
  • Space Allocation
  • Indexing and Reindexing
  • Disk Utilization
  • Production Job Scheduling

Figure 1: Business and Technical Meta Data

Meta Data Repositories and Related Tools

For many companies, attempts to define and implement meta data integration often fail dramatically or result in extremely long and expensive repository deployment projects which never quite achieve their stated objectives. In fact, the word "repository" has become synonymous with the solution for "meta data integration," and therein lies the first problem. Meta data repositories and products sold as meta data integration tools always have at least two things in common:

  1. The product has an underlying meta model which attempts to classify the relationships among pieces of meta data.
  2. There are claims asserted that the product will integrate disparate sources of meta data.

The meta model attempts to define a consistent structure into which all forms of meta data can fit or the model can usually (although sometimes with great difficulty) be "extended" or "customized" for a customer's own peculiarities. The integration component of these tools typically means that various vendor alliances have been formed such that one tool's meta data can be translated, or imported into and exported from, the repository vendor's meta model format. There are a number of standards for these translation mechanisms, including CDIF (case data interchange format), MDIS (meta data interchange standard) and the evolving Microsoft/Meta Data Coalition formats. If everything in a repository worked as advertised, would we have meta data integration? The answer is ­ probably not. The reason for this is there are two separate and very distinct ways to look at meta data ­ the IT perspective and the business perspective. From the IT perspective, the populated and well- functioning repository might deliver meta data integration; but from the business perspective, it is most likely we would still hear "it just doesn't work." Why? In order to understand this, it's important to distinguish what the business wants to accomplish with meta data.

A Comparison Between Technical and Business Meta Data

There are at least three types of meta data in the world of data warehousing: definition, transformation and derivation, and management/ administration. Definition meta data is meta data that helps define or describe the data. From a technology perspective, this would include field name, length, format and domain. From a business perspective, definition meta data includes the business context ­ that is, the ability to answer the questions:

  • What does this mean?
  • Where can I find it (How do I look it up)?
  • Who owns it?

Transformation and derivation meta data describes what happened to data on its way to a particular place (i.e., how was this data calculated in the data warehouse; how was it summarized in the data mart?). In the IT world, transformation and derivation meta data includes filters, aggregations, calculations and arithmetic expressions. For a business user, transformation and derivation meta data must include a business description (in English or a similarly appropriate non-techno-speak language) of the business rules that were applied, the sources (business transactions) that contributed to the data and how the information was calculated or derived.

  • CUSTOMER ID (unique key), 10 char, alphanumeric
  • First_name, 10 char, alphabetic
  • Last name, 20 char, alphanumeric
  • MI, 1 char, alphabetic
  • Address_line1, 35 char, alphanumeric
  • Address_line2, 35 char, alphanumeric
  • City, 20 char, alphabetic
  • ...

Figure 2: Technical Meta Data Management/administration meta data helps understand and optimize the way in which the data is used. From an IT perspective, management/administration meta data includes information that helps maintain or improve performance and efficiency ­ capacity planning, space allocation, indexing and re-indexing, and scheduling of production jobs. In the business world, management/administration meta data includes items such as information on available training, information on how to find the data more quickly or more efficiently (i.e., standard reports and subscription delivery systems), business organizations to contact for additional questions (user groups, steering teams, etc.) and a measurement of information "freshness."

The content of all three types of meta data ­ definition, transformation/derivation and management/ administration ­ is very different depending on whether you are a technical meta data consumer or a business meta data consumer.

Structured and Unstructured Meta Data

Not only is the actual content of the meta data very different from the business and technical perspectives, the form and format in which meta data is communicated is also worlds apart for these two types of constituents.

Technical meta data is usually highly structured. It is often organized as a group of related entities which can be further described by specific attributes, and the attributes themselves are well defined by quantifiable expression ­ numeric or otherwise (valid values, etc.).

Technical meta data is meta data that is most likely to be found in database catalogs, data dictionaries, data models, CASE tools, extraction/transformation tools and load utilities. The data is well suited to management in a meta model supported by a meta data repository tool. The challenges of integrating this highly structured meta data revolve primarily around the complexity that the sheer number of meta data sources introduces. Technical meta data integration is analogous to creating a common enterprise model from 150 different physical schemas. There is very valid reason to be concerned that technical meta data integration is too hard, too complex and potentially too expensive for the business to support.

Business meta data, on the other hand, is usually loosely structured or unstructured data. Business meta data cannot often be described by entities and attributes; rather, it takes the form of text documents, organization charts, membership directories, training schedules, electronic document images and perhaps even contains audio and/or video formats. Trying to integrate these formats using a repository tool and a highly structured meta model is doomed to fail, but usually not before lots of time and money have been spent.

The Paradigm Shift

Unstructured business meta data must be integrated using a fundamentally different architecture than used for structured technical meta data. Luckily, this architecture not only exists, but it already permeates the way we think about and interact with our world today. This architecture is the Internet. Using Internet technology to link disparate sources of business meta data takes advantage of what this technology does best. Building a meta data architecture with Web technology allows businesses to leverage their current technology investments and familiarization with Web navigation techniques. This architecture delivers ease of use because most users already have access to the intranet/Internet. Most importantly, distribution and extension of meta data is easy because the architecture of the solution is consistent with the architecture of the meta data components. No longer are we trying to force-fit unstructured data into a highly structured architectural solution.

Figure 3: Meta Data Web Site Structure

A sample meta data Web site structure is depicted in Figure 3. A search on "customer" in this meta data architecture might bring up the following business meta data links:

  • Customer, defined as a person or organization that has purchased a product from XYZ Corporation or its domestic subsidiaries
  • Customer, business rules applied to integrating customer data in the warehouse
  • Customer Sales Summary, Monthly Report #123
  • Customer Retention History, Quarterly Report #992
  • Customer, top 10 percent by purchase volume, Query #C199
  • Customer Satisfaction Survey Results
  • "The Changing Demographics of Our Customer," Marketing Department Report dated 6/1/99
  • "Customer Purchase Patterns," Marketing Department White Paper, 12/98
  • Customer Data Mart Steering Team Member Roster
  • Customer Data Mart Steering Team Minutes
  • Customer Feedback, video from committee meeting, 3/99

From this example, it becomes easy to understand how a discussion about meta data might now elicit a very different reaction ­ perhaps even a smile, followed by the comments:

  • It was easier than we thought.
  • The solution was straightforward.
  • This works really well!

The vice president of finance and IT for a large publishing firm, for whom an integrated business meta data architecture was recently deployed on the Web, says: "The Web-based, end-user tool and the business-term definitions on the meta data Web site have made it easy for our business people to gather timely information through our corporate intranet. Our internal communications have never been better. The project is viewed by all as a complete success."
In order to realize the full potential of the data warehouse environment, meta data must be available to business users. Providing this context to end users in a Web-based environment, where meta data from many sources and in many formats can be logically integrated, provides the foundation to establish, grow and enrich the meta data solution over time. Additional benefits can be immediately realized by posting data warehouse project status and deliverables on the meta data Web site. It can also serve as an effective means of delivering data warehouse and access tool training. With this practical and inexpensive meta data integration solution, business users and companies are able to develop, grow, use and manage their data warehouse environments to their full potential.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access