DM Review welcomes Malcolm Chisholm as our newest columnist. He will share his expertise on master reference data, business rules and metadata management in this MetaThoughts column.

Data management has taken a beating since the beginning of the millennium. What used to be a thriving discipline has been downsized or eliminated at many enterprises. Curiously, there has been a resurgence of it over the past year or two as the need to manage the ever-increasing complexity of the average enterprise's information assets grows. The sudden surge of interest in master data management (MDM) seems to be a symptom of this. Yet is traditional data administration or, perhaps more accurately, are traditional data administrators up to the challenge? Will they take advantage of the new opportunities that seem to be on offer, or are we simply building up to an inevitable future downturn?

The Logical Comfort Zone

I first became actively involved in forums that brought data administrators together in the 1990s. I noticed that many of them were not oriented to programming. My background involved a great deal of programming, and I was anxious to learn how metadata might be leveraged to build solutions. This seemed alien to the majority of data administrators I encountered. They were much more concerned about collecting and organizing metadata about an enterprise's information assets. Unfortunately, this was usually not about the actual assets of the enterprise, but about a "logical" view of the enterprise's data assets.

The idea that we need to model the way that an enterprise sees its data in business terms is an excellent one. If we truly represent data in this way it can - or should be - very useful. Activities to understand business data from a business perspective yield logical data models. Unfortunately, quite a number of data administrators I met in the early days were not much good at modeling either. They preferred the administrative side of things, such as devising ways to name entities and attributes or structures to store models. However, they felt that this contributed to good modeling.

After the logical data models were developed, data administration's role usually ended. At the very best, the logical models were directly implemented as physical databases. Even so, data administration's view was that whatever got put into these databases was their users' responsibility. For instance, if the users invented new codes in the code tables that inevitably make up 20 to 50 percent of any database, then that was of no interest. Of course, these codes in reality define new subtypes and drive sets of business rules. But they are physical data and therefore off limits to data administration. This ignored the fact that code values define a certain level of design in the database that needs to be understood if the database is to be usable. Over time, of course, more and more codes make databases less and less understandable.

Much more often, logical data models were simply thrown over the fence to programmers. The programmers were no doubt glad for this explanation of things but typically had their own ideas on implementation, and data administration was only too happy not to participate in that activity - which left them free to criticize it. To be fair, most enterprises have never had any processes to move database designs seamlessly from conceptual to physical.

Today the reality is that a huge amount of software is not developed but is purchased. Thus a logical data model is never going to dictate a storage solution. No doubt many managers have thought about this possibility as they have reviewed the budgets for their data administration functions.

Logical Products

Data administrators have confined themselves to the logical level of things, partly because of their own predilections, but partly because things have been structured that way. So just what have data administrators done at the logical level?

First, the data models. Whatever utility they have, business users cannot read them. They typically think that the boxes and lines represent some kind of flow chart. I once attended a two-week course on a financial data model with a business user who never had that illusion dispelled until the final day when he could ask some basic questions. The only people that can use data models are, well, people building databases - the majority of whom are, in a broad sense, data modelers. Also, data models are held in packages such as  ERwin, and getting the required licenses, etc. purchased and deployed is quite expensive. It has to be justified. That forces managers to ask who really needs access to data models. The answer comes readily to mind.

So if a library of data models has a limited audience, what about a metadata repository? We can extract the metadata from the data models and put it into a metadata repository. Or, much worse, we can get the modelers to send the metadata they capture into the repository at intervals when it has been validated as fit for use. This is worse because it is a recipe for repositories and models getting out of sync. Still, it avoids the difficult technical issue of extracting metadata directly from a data model. This, unfortunately, fits better with the perception that many data administrators have (or had) of themselves as glorified librarians.

What happens after the metadata repository is implemented? Well, I have seen too many of what I call "roach motel repositories." The metadata checks in, but it never checks out. That is, nobody uses the repositories, either because they cannot or do not know they exist or find they are not helpful. Repositories may be built at great expense and conform to all the supposed best practices and industry theories. However, "build it and they will come" does not work. Sometimes the technical implementation is so insular that you have to go to a data administrator to request an output from the repository. These repositories are not Web enabled, nor do they have good search capabilities. In other places, the existence of the repository is hidden deep in the corporate intranet.

Worse yet is the case of a business user who thinks he will discover everything he needs to know about, say, Account Number and accesses the repository only to discover that it has a suggested logical name of Account Number and is defined as "A unique number that specifies an account," and that's all, folks. What about the fact that Account Number is an intelligent key that is built one way in one part of the enterprise and a different way in another? What databases is it used in? What data types are used to specify it, and are these incompatible across different implementations? Are there intelligent ranges in Account Number, and what subtypes do they represent? When these questions are put to many data administrators, the latter tend to behave like French aristocrats forced to eat at McDonald's. Such questions pertain to the physical level and not the logical level, and that is the last place that most data administrators want to go.

Solving What Problems?

The focus on the logical level can sometimes be so extreme that there is no connection between the activities of data administrators and the physical level. Yet it is an inconvenient fact that all data is physically implemented and sits (usually) in records in columns in tables in databases. If a business user has a question about some data, it is almost certain to come from working with physical data values. Users will typically only be able to identify a data item from a report label, a screen prompt or a column name. Users may know the correct business term, but they are unlikely to know the "proper" logical data model name - with the entity at the start, a class word at the end and so on. In my experience, if they are familiar with the correct business term, they probably know more than is captured in any repository or data model. The latter are of more use to users working at the boundaries of their knowledge.

The disconnect between the logical and physical that has been so prevalent in data administration units is aided and abetted by the terrible definition of metadata as "data about data." This utterly meaningless definition has permitted data administrators to give the impression they have a handle on managing the enterprise's information assets whereas they have in fact been concentrating on an irrelevant, perhaps Utopist, view of how things should be that is of limited practical help.

One of my favorite examples in this regard is the search for the single version of the truth. It makes life a lot easier for data administrators if there is only one definition for each entity and attribute. Across an enterprise there should only be one definition for Customer. Reality is otherwise, but that has not dampened the enthusiasm in countless data administration units. They inevitably see the "benefits" of obtaining a single consistent definition of Customer, even though it will always include prospects for marketing and only parties that have been invoiced for accounts receivable. Worse yet, the data administrators do not see that the process of this search can actually create interdepartmental conflicts between areas that formerly lived side by side in relative peace. Data administration units that behave in this way will come to be viewed as troublemakers who solve no business problems, but rather, absorb the time and energy of otherwise productive staff.

Standards or Services

The problem with the logical/physical divide can be so extreme in some data administration units that they produce little more than a set of inaccessible data models and a repository filled with odd names and overgeneralized definitions that fail to correlate with the enterprise's implemented information assets. Where data administrators venture into further outputs, it is usually in the area of standards and guidelines.

Data administration standards and guidelines typically revolve around the things they are comfortable with. These are logical-level data objects and processes that involve data models or repositories. Where the standards apply to data administration, they stand some chance of success, but there are three problems. The first is that data administrators are usually not in the habit of "eating their own dog food." They will come up with a standard that prohibits the use of intelligent keys in databases and then insist on "naming standards" that cram about 20 independent items of metadata into an entity name. The second problem is that the standards are often not well written and have no authority behind them. Nor do they have success metrics built in. In effect, they are suggestions, not standards. They tell people what to do - in very general terms. They cannot tell them how to do something because they are too general to develop processes around. Nor do they offer any support services. This brings us to the third problem, which is that the outputs of data administration are documents rather than services.

When I ask most data administration units what metadata services they provide to the rest of the organization, I usually get blank looks and then answers that revolve around the library of data models, the repository and the boatload of standards. These are not services. Services involve mechanisms for letting users know what data they have, what it means, where they can get it and what problems it has in ways that matter to the users. This means we are talking about actual data in physically implemented databases. It also means building and deploying true repository functionality that targets real-world use cases. Metadata services are a lot more than a way of accessing an irrelevant set of documentation.

Fair or Unfair?

How applicable are all these criticisms to data administration units? They are certainly truer of what happened in the past than of what is happening today. They are also not universally applicable. Many data administrators have descended from the lofty but purely logical heights to roll up their sleeves and get something done at the grimy physical level. However, it is also true that many data administrators have not moved from the logical level. Regrettably, they are supported by a large body of thought that sees the logical level alone as relevant and thinks that data management ills will only be cured once that vision is fully shared. We desperately need to break out of the logical-only mind-set and expand data administration to the physical level. 

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access