Meta data integration is one of the least understood and most misrepresented topics in the entire decision support arena. It's astounding the number of times we're told that a tool seamlessly integrates every source of meta data.
This article will present a "real-world" example of meta data integration and the challenges of today's meta data integration tools market. Before we begin reviewing the strengths and limitations of these meta data integration tools, keep in mind that it is much easier purchasing such a tool and working around its limitations, as opposed to manually building programs to populate and support the meta data repository.
A State of the Market
One of the chief reasons why these tools are not cleanly integrated is that there isn't a globally adopted meta model standard (the meta model refers to the physical data model of the repository). Once an appropriate meta model standard is defined and adopted, the data warehousing products from different vendors will be able to share information much more easily.
The Microsoft Repository provides the best hope to date for defining such an industry-wide standard for the meta model. While Microsoft definitely has the clout to bring the industry around to a standard meta model, industry analysts are still taking a "wait-and-see" attitude. Assuming the Microsoft Repository becomes the standard, it will take two to three years before their meta model is mature enough for most organizations and tool vendors to integrate it into their software.
Meta Data Integration Tools
There is a dizzying array of meta data repository tools, and most of them claim to seamlessly integrate all meta data sources into one architected repository. However, reality offers a different message. It is true that most of these tools do a pretty good job of integrating formal meta data sources (i.e., CASE tools, extraction/transformation tools and other repositories) which they tend to certify. On the other hand, a majority of the meta models being offered do not provide an adequate foundation for business meta data and lack an overall vision for the complete data administration process. It is critical that the meta model that becomes adopted as a standard is as strong on the business side of the meta data equation as it is on the technical side. Unfortunately, most meta models are inadequate in this respect. Some of the limitations include the entities and attributes to support business meta data for user access patterns, frequency of use, transformation rules in business terms, DSS (decision support system) report business definitions and source system business definitions. On the technical meta data side, many of these models do not support warehouse balancing information and data quality metrics for each of the data warehouse/data mart load runs.
As meta data requirements are defined, the data administration team will likely have to extend the meta model to accommodate those needs. Therefore, a prerequisite for any tool must be to have a fully extendible meta model. Most tools do come with their own meta model and provide the capability to extend the model as needed. However, the task for extending the meta model is not trivial. It is equivalent to modifying a data model in a decision support system. As a result, it is typical, if not a prerequisite, to have a full-time data modeler on the data administration team.
Few of the repository tools provide a Web-enabled access layer that would satisfy a business user. Typically the access piece needs to be provided by a "true" reporting tool (i.e., MicroStrategy, Cognos or Business Objects). Whichever access tool is selected, it needs to be fully Web-enabled, capture user access patterns and frequency of report and table use. This meta data would then be extracted and stored directly into the repository.
Meta Data SourceIntegration
Each of the repository tools integrates meta data through differing interfaces. However, three broad categories can be defined to classify the manner in which these interfaces integrate different sources of meta data. Each of these categories requires varying levels of integration complexity and meta model changes.
Certified sources of meta data are those sources that the tool can directly read, properly interpret the information and load it into the correct attributes of the meta model. These sources are easily integrated and do not require an extension to the base meta model. In addition, because the tool is designed to accept these sources they do not require additional programming or analysis. Common examples of certifiable meta data sources include technical meta data from CASE tools and transformation rules from extraction/transformation engines. Normally a repository tool is certified for several vendor tools in each of these categories.
Most tools allow for one or more generic meta data sources. Generic meta data sources are those that are in a common format (i.e., CDIF, tab delimited, space delimited, comma delimited) that the tool can read. The challenge with these sources is that while the tool can easily read the source, programming is still needed to map the source elements to the correct attributes in the meta model. As a result, it is important that the repository tool has an interface that can be easily changed to map these sources. In addition, it is somewhat common for these sources to require extensions to the meta model. The process for extending the model can range from a simple change (if all that is needed is to add an additional attribute to an existing table) or it can be quite complex (if new tables are required and additional foreign keys are needed for other tables to reference them). If the repository is built on an object database, it can greatly simplify the task of administrating changes to the model. Common examples of generic sources are those technical and business meta data sources in databases and spreadsheets which are easily extracted into these formats.
Some sources of meta data are not certified or in a generic format. These non-supported sources might require sophisticated analysis for design and programming. These sources contain all of the challenges of generic sources with the potential added challenge of a complicated programming step that would be needed to transform the non-supported source into a generic source. Non-supported sources are common for the informal business meta data sources and the meta data stored in vendor applications.
Meta Data Integration Architecture
Before meta data integration can begin, it is critical that a meta data readiness assessment be conducted and that the meta data requirements and their meta data sources have been clearly identified and documented.
Figure 1 illustrates an actual integration strategy. This organization purchased a vendor tool to integrate their various sources of meta data (listed in Figure 2). The process for integrating all of these sources presented a challenge. After determining that the repository tool was certified with the CASE tool and the integration tool, the next step was integrating the data dictionary. The data dictionary was located in a third-party application in a proprietary database format. As a result, its format was not supported by the integration tool. The answer to this problem was to design and write two complex programs to extract and manipulate the data dictionary into a format (comma delimited) that could be integrated by the repository tool. This process took one fully dedicated programmer one month to accomplish. The last source of meta data was the data steward's spreadsheet. This source was housed in MS Excel and could be extracted in space delimited format. An OLAP tool was used to access the information in the meta data repository. This tool captured user access patterns and frequency of use which the data administration staff used to guide them on future repository development phases.
|Meta Data Sources||Meta Data Description||Type||Model Extension|
|CASE Tool||Physical & logical models, domain values, technical entity definitions and technical attribute definitions||Certified||No|
|Extraction/Transformation||Technical transformation rules||Certified||No|
|Custom Data Dictionary||Business attribute and entity definitions||Non-supported||No|
|MS Exel||Data steward's list||Generic||Yes|
|Reporting Tool||Access patterns and frequency of use||Generic||Yes|
Figure 2: Meta Data Sources It is critical for the data administration team to reside at the enterprise level in the organization. This will allow the team to set standards for the capture of the different meta data sources that need to be integrated into the repository across all of an organization's DSS projects. This is important because most Fortune 1000 companies have multiple DSS projects. If these standards are not in place, the task of integrating each of the meta data sources becomes exceedingly more difficult. These standards should be kept simple and be easily communicated and understandable. The standards cannot become a bottleneck for the other DSS teams or they will not be followed.
A meta data repository built with the business users in mind and created on a technologically sound architecture lifts the data warehouse from a stovepipe application to a true business intelligence system. Even with the immature state of the meta data repository marketplace, the alternative of not building a repository will not satisfy the needs of the business users or the data warehouse staff that will need to maintain the DSS system over time. This issue of meta data integration is one of the chief mitigating factors that has prevented most organizations from achieving successful data warehouse and data mart implementations.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access