Will Rogers once said, "I never met-a-data I didn't like." Well, maybe it was a misquote. However, in a data warehouse architecture, there is much to like about meta data. Meta data is hard to define, but essential to life as we know it. Sort of like those old "better living through chemistry" commercials. So just what is meta data? The common definition for meta data is that it is data about other data. Some alternative definitions might be that meta data is:

  • The table of contents and index for a book.
  • The glue that holds together the content of a data warehouse.
  • A road map to integrating multiple data sources.

Meta data is defined and used throughout the data warehouse--from data acquisition through ad hoc queries. The three most common uses of meta data in the data warehouse are in acquisition, transformation and access of data. First, meta data is used in data acquisition for identifying the data description map of the operational source files. Table layouts, acceptable values and field edits provide valuable analytical information and should be stored in a data dictionary.
Second, in the transformation of data, meta data is integral to defining the rules of source data scrubbing prior to loading the data warehouse. Why is this important? One reason is that operational source systems seldom keep a longitudinal view of key information.

Let's look at a health care company--data is captured for a member's eligibility concerning current services but not for a prior period eligibility. Transformation meta data rules are used to construct an unbroken historical record of member eligibility from current and prior periods, providing a detailed view of eligibility over time that can't be obtained from the operational system. The meta data rules that define the process for weaving together this historical view add essential business value and knowledge. Also, abstraction of this meta data process allows the knowledge to be transferred to many people and not trapped in the minds or process of a few.

Finally, meta data is extremely valuable in data access for aggregate navigation, making the use of the data warehouse easier for business users. For example, how does a non-technical business user determine TOTAL-SALES by a given sales REGION? Meta data rules should guide the business user intuitively to the answer: Starting with the data dictionary, we would explore the description of TOTAL-SALES. It indicates that it does not include returns--another field, NET-SALES (which does include returns) better matches our analysis. However, NET-SALES is not summarized by REGION, so we will need to calculate the sum by adding up the total sales by SALES-OFFICE to get the total sales by region.

Many times, the most powerful use of meta data is when it is invisible to the business user, allowing an intuitive interaction with the content of the data warehouse without having to worry about how the information requested is being gathered. A good illustration is exemplified by using a browser on the Web: You don't think about how to find content. You just key in the desired content description, and let the search engine go to work.

Unfortunately, there are still some drawbacks in defining and using meta data. Common standards don't exist yet. Tool vendors have proprietary repositories for storing meta data. Standards bodies are working through the process, but a clear winner with market acceptance is not imminent. So what should we do? Focus on the basic building blocks of a sound data warehouse architecture for defining, storing and using meta data now.

Build a data model in a tool that can generate database schemas and use its data dictionary to store meta data about your source to target data map.

Define and abstract transformation rules into meta data--the rules that define the business process for scrubbing and cleaning source data as it flows into the data warehouse. If you have a data repository, store your transformation meta data there. If you don't have a data repository, just create some additional tables on your friendly neighborhood DBMS and park your meta data there for now.

Add your meta data descriptions for aggregate navigation of the data warehouse, which is best accomplished from a business user prospective, to the data dictionary.

Monitor the progress of vendors to cooperate, but don't let marketing hype drive your sound business decisions. Remember that a good data warehouse is a journey, not a destination. Enjoy the ride.

Next month: Transforming warehouse data into business intelligence--query and reporting, mining and visualization.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access