This column is adapted from the book Universal Meta Data Models by David Marco and Michael Jennings (John Wiley & Sons).
In last month's column, I presented the meta data sourcing layer of a managed meta data environment (MME) along with a walk-through of one of the most common sources of meta data: software tools. In this column, I will walk through two additional common meta data sources: end users, and documents and spreadsheets.
End users are one of the most important sources of meta data that is brought into the MME. These users come in two flavors: business and technical. Figure 1 lists the types of meta data entry done by each group.
Figure 1: End-User Meta Data Entry
Often, the business meta data for a corporation is stored in the collective conscience of its employees. As a result, it is vital for the business users to input business meta data into the repository. The need for active and engaged business users ties into the topic of data stewardship.1
The technical users also need direct access into the meta data repository to input their technical meta data. Because much of the technical meta data is stored in various software tools, the task for technical users to input the technical meta data is not as rigorous as it is for business users to input the business meta data.
The interface for both of these user groups should be Web-enabled. The Web provides an easy to use and intuitive interface with which both of these groups are familiar. It is critical that this interface is directly linked to the meta data in the repository. I strongly suggest the use of drop boxes and pick lists, as these functions are highly familiar to users. You should always use the referential integrity that the database software provides.
Documents and Spreadsheets
A great deal of meta data is stored in corporate documents (Microsoft Word) and spreadsheets (Microsoft Excel). The requirements of your MME will greatly impact the degree to which you need to bring in meta data from documents or provide pointers to them. Sometimes, these documents and spreadsheets are located in a central area of a network or on an employee's computer. In most organizations, however, documents and spreadsheets tend to be highly volatile and lack standardized formats and business rules. As a result, they are traditionally one of the most unreliable and problematic sources of meta data in the MME. Sometimes, business meta data for these sources can be found in the note or comment fields associated with the document or cell (if in a spreadsheet). Technical meta data, such as calculation, dependencies or lookup values, is stored in the application's (Microsoft Excel or Lotus 1-2-3) proprietary data store.
For companies that have implemented a document management system, it is important to extract the meta data from these sources and bring it into the MME's repository. Typically, when a company builds a document or content management system, it also purchases a software product to aid management of meta data on documents, images, audio, geospatial (geographical topography) and spreadsheets. It is important to have a meta data sourcing layer that can read the meta data in the document management tool, extract it and bring it into the MME's repository. This task is extremely difficult because most document management companies do not understand that they are really meta data repositories and, as such, need to be accessible. These tools often employ proprietary database software to persist their meta data and/or their internal database structure is highly obfuscated, meaning that the structure of the meta data is not represented in the meta model, but is instead represented in program code. As a result, it can be difficult to build processes to pull meta data from these sources (Figure 2).
Figure 2: Meta Data Sourcing Layer - Document Management Sources
1. For a detailed discussion of data stewardship, please see David Marco's four-part series in the December 2002 - March 2003 issues of DM Review.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access