JUN 19, 2008 5:23pm ET

Related Links

Many Enterprises Fill Skills Gap with ‘Citizen Developers’
August 27, 2014
Building an Advanced-Analytics Center of Excellence
August 25, 2014
Unleashing the Value of Advanced Analytics in Insurance
August 25, 2014

Web Seminars

Why Data Virtualization Can Save the Data Warehouse
September 17, 2014
Essential Guide to Using Data Virtualization for Big Data Analytics
September 24, 2014

Metadata is Master Data

Print
Reprints
Email

Any discussion of metadata has to begin by defining what it is. The usual definition, that it is “data about data,” is extremely unhelpful. It is not really accurate either, because metadata can be data about applications, or networks, or a whole host of other things that are not strictly data. I define the term as follows: metadata is the data that describes any aspect of an enterprise’s information assets and enables the organization to use and manage these assets.

However, there is general agreement that “core” metadata for data management will include information about entities and attributes found in data models. More lukewarm assent is usually given to relationship metadata being included in this mix, although it is extremely important for certain requirements.

There is generally less enthusiasm among data professionals for metadata about physically implemented data assets, such as databases, tables, columns, records and facts (column values in a record). This logical/physical divide that has pervaded the mindset of legacy data administration presents an enormous problem today.

Despite the views of many data administrators, physical data is valued by the enterprise. Data can be reused in any number of business processes and, thus, any number of applications. Indeed, today we see a trend away from developing new applications that cover new business data domains. Instead, there is a strong movement toward integrating existing data assets. Integration requires a detailed understanding of the sources of data, but the way this is being addressed is typically on a project-by-project basis, with the results being discarded after use. Excel spreadsheets, Access databases or even Word documents seem to be the predominant tools used for storing metadata in such projects. This metadata is primarily oriented to tables, columns and their relationships, but inevitably includes business terms and definitions.

On a separate front, integration requires the running of extract, transform and load (ETL) tools. These products have become increasingly sophisticated over the years. They are, of course, primarily oriented to manipulating data. Identifying sources and targets, performing data transformations, running data validation checks and so on again requires knowledge of tables, columns and perhaps relationships. None of this is easy to do without knowledge of what the tables and columns mean, and that requires an understanding of the entities and attributes to which the physical data objects correspond at the logical level.

Business intelligence tools, even simple reporting packages, are increasingly metadata aware, too. Rather than present users with table and column names, they have functionality that permits these to be resolved to business names and definitions. Again, this is what we will find in a logical data model. The metadata in these tools - their little, standalone data dictionaries - has to be populated from somewhere.

Business rules engines are yet another consumer of metadata. They need to be taught about the databases they are connected to. In fact, many software packages now contain some business rules functionality to enable them to be easily configured and implemented for certain client-specific functionality. Once again, the metadata has to come from somewhere, be put into the tool and be kept up to date.

IT is the place in the enterprise where you will find the greatest number of silos. This is especially true of metadata management. Metadata is the information that makes - or should make - IT processes work. However, very few data professionals see it that way. Instead, they look at metadata as something that has to be served up to the rest of the business. There seems to be an attitude that IT is simply a pool of resources available to work on projects. The idea that IT might be a horizontal business function like finance or HR is denied or rejected. If it were accepted, then IT would be expected to have its own permanent business processes and supporting infrastructure, just like HR and finance. These processes would produce and consume metadata, and the supporting infrastructure would manage the metadata.

While IT sees itself as another world distinct from the business, there will be problems managing metadata. The same metadata will be produced and consumed in myriad silos across the domain of IT. We can expect the metadata to diverge in these silos and to be of unknown and unreliable quality because there are no business processes devoted to its maintenance. Viewing everything from the context of a project is especially counterproductive. At the end of the project, the results of any analysis are essentially thrown away. No processes exist to keep these artifacts up to date. The individuals who work on a project typically do so based on a lot of verbal communication, so the analysis artifacts they produce do not have to be of the highest quality and completeness. Any metadata produced in this way will be suspect. The self-contained aspect of projects means there is no intention of metadata passing beyond the boundaries of the project for further reuse. Apparently data is a corporate resource, but metadata is not.

This is a classic master data management problem. Metadata is produced in a variety of silos, and it is often the same metadata. There is no central hub from which high quality metadata can be obtained. There is little appreciation for the quality of the metadata or knowledge about it. Until IT in general, and data professionals in particular, wake up to the fact that metadata is master data, we will never be able to adequately support the management of the enterprise’s information assets.

Get access to this article and thousands more...

All Information Management articles are archived after 7 days. REGISTER NOW for unlimited access to all recently archived articles, as well as thousands of searchable stories. Registered Members also gain access to:

  • Full access to information-management.com including all searchable archived content
  • Exclusive E-Newsletters delivering the latest headlines to your inbox
  • Access to White Papers, Web Seminars, and Blog Discussions
  • Discounts to upcoming conferences & events
  • Uninterrupted access to all sponsored content, and MORE!

Already Registered?

Filed under:

Advertisement

Comments (0)

Be the first to comment on this post using the section below.

Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.
Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.