This column is adapted from the book Universal Meta Data Models by David Marco and Michael Jennings, John Wiley & Sons.
In the last several columns, I presented the first component of a managed meta data environment (MME), the meta data sourcing layer. This installment on the six architectural components of a MME will walk through the second and third major components of an MME: meta data integration layer and meta data repository.
Meta Data Integration Layer
The meta data integration layer takes the various sources of meta data, integrates them and loads it into the meta data repository (see Figure 1). This approach differs slightly from the common techniques used to load data into a data warehouse, as the data warehouse clearly separates the transformation (what we call integration) process from the load process. In an MME, these steps are combined because, unlike a data warehouse, the volume of meta data is not nearly that of data warehousing data. As a general rule, the MME holds between 5 and 20 gigabytes of meta data; however, as MMEs are looking to target data audit related meta data, storage can grow into the 20-75 gigabyte range. Over the next few years, you will see some MMEs reach the terabyte range.
Figure 1: Meta Data Integration Layer
The specific steps in this process depend on whether you are building a custom process or if you are using a meta data integration tool to assist your effort. If you decide to use a meta data integration tool, the specific tool selection can also greatly impact this process.
Meta Data Repository
A meta data repository is a fancy name for a database designed to gather, retain and disseminate meta data. The meta data repository is responsible for the cataloging and persistent physical storage of the meta data.
The meta data repository should be generic, integrated, current and historical. Generic means that the physical meta model looks to store meta data by meta data subject area as opposed to being application-specific. For example, a generic meta model will have an attribute named DATABASE_PHYS_NAME that will hold the physical database names within the company. A meta model that is application-specific would name this same attribute ORACLE_PHYS_NAME. The problem with application-specific meta models is that meta data subject areas change. To return to our example, today Oracle may be our company's database standard. Tomorrow, we may switch the standard to SQL Server for cost or compatibility advantages. This situation would cause needless additional changes to the physical meta model.1
A meta data repository also provides an integrated view of the enterprise's major meta data subject areas. The repository should allow the user to view all entities within the company, not just entities loaded in Oracle or entities that are only in the customer relationship management (CRM) applications.
Third, the meta data repository contains current and future meta data, meaning that the meta data is periodically updated to reflect the current and future technical and business environment. Keep in mind that a meta data repository is constantly being updated - and it needs to be in order to be truly valuable.
Lastly, meta data repositories are historical. A good repository will hold historical views of the meta data, even as it changes over time. This allows a corporation to understand how their business has changed over time. This is especially critical if the MME is supporting an application that contains historical data, like a data warehouse or a CRM application. For example, assume the business meta data definition of customer is "anyone who has purchased a product from our company in one of our stores or through our catalog." A year later, a new distribution channel is added to the strategy. The company constructs a Web site to allow customers to order products. At that point in time, the business meta data definition for customer would be modified to "anyone who has purchased a product from our company in one of our stores, through our mail order catalog or via the Web." A good meta data repository stores both of these definitions because they both have validity, depending on what data you are analyzing (and the age of that data). Lastly, it is strongly recommended that you implement your meta data repository component on an open, relational database platform, as opposed to a proprietary database engine.
1. See Chapters 4 - 8 of Universal Meta Data Models (David Marco & Michael Jennings, Wiley 2004) for various physical meta models.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access