Meta Data Repository Redux, Part 1 Meta Data Use Case Detail and Diversity
InfoManagement Direct, April 2004
They say what goes around comes around and this certainly applies to the meta data information repository. Over the last 25 years, each great evolution of information systems technology and best practices has bred the same set of issues and the invariable vendor response - the meta data repository (or some derivation). What is this problem that keeps recurring? Simply put, as newer technologies are used to develop systems, the increase in complexity breaks the system management status quo. The answer always involves storing information about the system components in a central place for analysis - i.e., the meta data repository. It hasn't always been called this, but nonetheless, the basic approach to solving IT management complexity is the same. We are now witnessing this event playing out once more, but I'm getting ahead of the story.
This article seeks to examine the phenomena of meta data repositories, identifying common patterns of process and technology. Despite what appears to be great similarities, differing IT constituencies and problem domains suggest that a one-size-fits-all solution approach will not work. Finally, in part 2, I will define a strategy for successfully implementing an IT knowledge management program based on meta data repository concepts.
Advertisement

Figure 1: Meta Data Management Through the Decades
The Nature of the Problem
Ultimately, the organization will implement a "just good enough" approach to managing complexity. The problem is that each new successive wave of application innovation expands the linkages between system components, thus breaking the current "sufficient solution" put into place to meet the prior generation's system management problems. Figure 1 illustrates the enabling technology that led to the IT complexity and, ultimately, initiated the meta data management response. In all cases, meta data from related toolset and infrastructure suites was integrated for the purpose of better understanding and managing the IT resource. Furthermore, the solution must provide the same generic types of meta data retrieval and analysis capabilities:
- Definition analysis - I have this thing, but I don't really know what it is or what it means. Every project starts with the software archeology phase where the developers and users try to understand the current system.
- Impact analysis - I make a change to one thing, and something else stops working. How can I figure out the impact of my changes?
- Where used analysis - I have a certain thing that is used in many places. Now I have to change this thing. How many do I have and where do I find them?
- Difference analysis - I have many instances of the same thing, how can I prove they are exactly the same? Another example: Something just stopped working - but I don't know what's changed in the environment.
- Location/navigation analysis - I have this thing, now I want to go look at it directly. Because meta data is often removed from the original source, the user may want to inspect the source directly.
The examples in Figure 1, while similar in nature, illustrate a growing diversification in the type of meta data and the intended audience. The "user" column now includes virtually everyone in the organization! From technical providers to senior management, everyone in the organization wants easier access to information that defines the IT environment. Given such a broad range of users it should not be surprising that their information needs vary.
This expanding universe of "everything IT" information requirements forces us to examine meta data complexity. This complexity can be measured across two important axes:
- Detail - Detail includes several different measures including:
- Number of different classes of data
- Number of instances of each class of data
- Number of properties of each class of data
Meta data can exist at different "roll ups," just like data in the data warehouse. For example, the identical database may be installed on multiple servers, each in a different location. Depending on the analysis being performed, you might want to know the count of all instances of the same database or just the fact that it exists. Further grouping of databases into subject areas would represent a higher level of abstraction (and less detail).
- Diversity - there are literally thousands of "classes" of meta data. Simply identifying every occurrence of every IT asset can produce millions of individual records in the repository. Diversity also includes relationships between IT components. Some analysis requires only closely related meta data (i.e., what columns are in a table), while other analysis requires chasing down long chains of relationships (i.e., what database servers support this business unit).

Figure 2: Meta Data Complexity Varies by Usage Pattern
Figure two illustrates the relationship between meta data detail and diversity. Closer examination of the broad range of meta data-related efforts in the context of the user community being served and the detail/diversity of meta data required leads to the observation of three distinct use cases for the application of meta data. These use cases have distinctly different patterns of meta data detail and diversity.
- Control. Focus is on families of related toolsets and infrastructure with the goal of improving some aspect of IT system management. Emphasis is on "impact analysis" and "difference analysis." Rapid problem resolution requires access to real-time meta data from multiple infrastructure component sources. These efforts occur within the technology domain team (i.e., the data architecture group creates a data dictionary, the systems management group installs inventory management).
- Inform. Here the focus is on "definition analysis" and transferring knowledge from the IT domain experts to the generalist. This includes application developers and systems analysts who start every project with the "software archeology" phase (i.e., researching existing code and database structures to determine how existing systems work). This isn't an issue if your systems and databases are all well documented. End users want to know the definition of data that appears on data warehouse reports, where the data came from and how it was transformed along the way. The problem is that much of this knowledge is locked up in tools or infrastructure (e.g., data models, DBMS catalogs, source code). Making this information available to a wide audience is the goal for these types of meta data management efforts.
- Plan. The goal is understanding how the business relates to IT across the entire range of IT assets (e.g., data, application, infrastructure). "Where used" analysis is the primary requirement. For example, if the company expands this product line, what systems will be impacted? Which systems support our customers? Identify processes that redundantly store data. IT strategic planning is increasingly looking to the application of meta data to better support investment decision making. These efforts are an outgrowth of the emphasis on enterprise architecture planning and create the added challenge of describing and documenting more abstract concepts such as business process and application system, which are really agglomerations of literally thousands of the individual IT components stored in the "control" and "inform" repositories.
Twenty-first century meta data management for IT has morphed into a knowledge and content management project with the goal of capturing, classifying and categorizing all things IT. Pursuing this goal will result in a better informed IT staff and will reduce time to market for new systems. IT to business alignment will improve as the IT process and deliverable becomes more transparent to the business and as the business can more readily measure the impact of spending decisions on the IT architecture. Critical to success is recognizing the three distinct meta data management patterns: control, inform and plan. No single vendor provides an overall solution, rather a mix of technologies and in-house developed solutions will be required. Part 2 of this article will examine the three meta data patterns and suggest an architectural approach for a comprehensive meta data solution.
John Singer is a 24-year veteran information systems professional who has focused on data management activities including database administration, data administration and enterprise architecture in both staff and management roles. Currently working as a data architect at MasterCard International, Singer has experience in the pharmaceutical, healthcare, manufacturing, retail and criminal justice industries. He can be reached at john_singer@mastercard.com.
For more information on related topics, visit the following channels:





