Any discussion of metadata has to begin by defining what it is. The usual definition, that it is “data about data,” is extremely unhelpful. It is not really accurate either, because metadata can be data about applications, or networks, or a whole host of other things that are not strictly data. I define the term as follows: metadata is the data that describes any aspect of an enterprise’s information assets and enables the organization to use and manage these assets.

However, there is general agreement that “core” metadata for data management will include information about entities and attributes found in data models. More lukewarm assent is usually given to relationship metadata being included in this mix, although it is extremely important for certain requirements.

There is generally less enthusiasm among data professionals for metadata about physically implemented data assets, such as databases, tables, columns, records and facts (column values in a record). This logical/physical divide that has pervaded the mindset of legacy data administration presents an enormous problem today.

Despite the views of many data administrators, physical data is valued by the enterprise. Data can be reused in any number of business processes and, thus, any number of applications. Indeed, today we see a trend away from developing new applications that cover new business data domains. Instead, there is a strong movement toward integrating existing data assets. Integration requires a detailed understanding of the sources of data, but the way this is being addressed is typically on a project-by-project basis, with the results being discarded after use. Excel spreadsheets, Access databases or even Word documents seem to be the predominant tools used for storing metadata in such projects. This metadata is primarily oriented to tables, columns and their relationships, but inevitably includes business terms and definitions.

On a separate front, integration requires the running of extract, transform and load (ETL) tools. These products have become increasingly sophisticated over the years. They are, of course, primarily oriented to manipulating data. Identifying sources and targets, performing data transformations, running data validation checks and so on again requires knowledge of tables, columns and perhaps relationships. None of this is easy to do without knowledge of what the tables and columns mean, and that requires an understanding of the entities and attributes to which the physical data objects correspond at the logical level.

Business intelligence tools, even simple reporting packages, are increasingly metadata aware, too. Rather than present users with table and column names, they have functionality that permits these to be resolved to business names and definitions. Again, this is what we will find in a logical data model. The metadata in these tools - their little, standalone data dictionaries - has to be populated from somewhere.

Business rules engines are yet another consumer of metadata. They need to be taught about the databases they are connected to. In fact, many software packages now contain some business rules functionality to enable them to be easily configured and implemented for certain client-specific functionality. Once again, the metadata has to come from somewhere, be put into the tool and be kept up to date.

IT is the place in the enterprise where you will find the greatest number of silos. This is especially true of metadata management. Metadata is the information that makes - or should make - IT processes work. However, very few data professionals see it that way. Instead, they look at metadata as something that has to be served up to the rest of the business. There seems to be an attitude that IT is simply a pool of resources available to work on projects. The idea that IT might be a horizontal business function like finance or HR is denied or rejected. If it were accepted, then IT would be expected to have its own permanent business processes and supporting infrastructure, just like HR and finance. These processes would produce and consume metadata, and the supporting infrastructure would manage the metadata.

While IT sees itself as another world distinct from the business, there will be problems managing metadata. The same metadata will be produced and consumed in myriad silos across the domain of IT. We can expect the metadata to diverge in these silos and to be of unknown and unreliable quality because there are no business processes devoted to its maintenance. Viewing everything from the context of a project is especially counterproductive. At the end of the project, the results of any analysis are essentially thrown away. No processes exist to keep these artifacts up to date. The individuals who work on a project typically do so based on a lot of verbal communication, so the analysis artifacts they produce do not have to be of the highest quality and completeness. Any metadata produced in this way will be suspect. The self-contained aspect of projects means there is no intention of metadata passing beyond the boundaries of the project for further reuse. Apparently data is a corporate resource, but metadata is not.

This is a classic master data management problem. Metadata is produced in a variety of silos, and it is often the same metadata. There is no central hub from which high quality metadata can be obtained. There is little appreciation for the quality of the metadata or knowledge about it. Until IT in general, and data professionals in particular, wake up to the fact that metadata is master data, we will never be able to adequately support the management of the enterprise’s information assets.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access