Data management has taken a beating since the beginning of the millennium. What used to be a thriving discipline has been downsized or eliminated at many enterprises. Curiously, there has been a resurgence of it over the past year or two as the need to manage the ever-increasing complexity of the average enterprise's information assets grows. The sudden surge of interest in master data management (MDM) seems to be a symptom of this. Yet is traditional data administration or, perhaps more accurately, are traditional data administrators up to the challenge? Will they take advantage of the new opportunities that seem to be on offer, or are we simply building up to an inevitable future downturn?
The Logical Comfort Zone
I first became actively involved in forums that brought data administrators together in the 1990s. I noticed that many of them were not oriented to programming. My background involved a great deal of programming, and I was anxious to learn how metadata might be leveraged to build solutions. This seemed alien to the majority of data administrators I encountered. They were much more concerned about collecting and organizing metadata about an enterprise's information assets. Unfortunately, this was usually not about the actual assets of the enterprise, but about a "logical" view of the enterprise's data assets.
The idea that we need to model the way that an enterprise sees its data in business terms is an excellent one. If we truly represent data in this way it can - or should be - very useful. Activities to understand business data from a business perspective yield logical data models. Unfortunately, quite a number of data administrators I met in the early days were not much good at modeling either. They preferred the administrative side of things, such as devising ways to name entities and attributes or structures to store models. However, they felt that this contributed to good modeling.
After the logical data models were developed, data administration's role usually ended. At the very best, the logical models were directly implemented as physical databases. Even so, data administration's view was that whatever got put into these databases was their users' responsibility. For instance, if the users invented new codes in the code tables that inevitably make up 20 to 50 percent of any database, then that was of no interest. Of course, these codes in reality define new subtypes and drive sets of business rules. But they are physical data and therefore off limits to data administration. This ignored the fact that code values define a certain level of design in the database that needs to be understood if the database is to be usable. Over time, of course, more and more codes make databases less and less understandable.
Much more often, logical data models were simply thrown over the fence to programmers. The programmers were no doubt glad for this explanation of things but typically had their own ideas on implementation, and data administration was only too happy not to participate in that activity - which left them free to criticize it. To be fair, most enterprises have never had any processes to move database designs seamlessly from conceptual to physical.
Today the reality is that a huge amount of software is not developed but is purchased. Thus a logical data model is never going to dictate a storage solution. No doubt many managers have thought about this possibility as they have reviewed the budgets for their data administration functions.
Data administrators have confined themselves to the logical level of things, partly because of their own predilections, but partly because things have been structured that way. So just what have data administrators done at the logical level?
First, the data models. Whatever utility they have, business users cannot read them. They typically think that the boxes and lines represent some kind of flow chart. I once attended a two-week course on a financial data model with a business user who never had that illusion dispelled until the final day when he could ask some basic questions. The only people that can use data models are, well, people building databases - the majority of whom are, in a broad sense, data modelers. Also, data models are held in packages such as ERwin, and getting the required licenses, etc. purchased and deployed is quite expensive. It has to be justified. That forces managers to ask who really needs access to data models. The answer comes readily to mind.
So if a library of data models has a limited audience, what about a metadata repository? We can extract the metadata from the data models and put it into a metadata repository. Or, much worse, we can get the modelers to send the metadata they capture into the repository at intervals when it has been validated as fit for use. This is worse because it is a recipe for repositories and models getting out of sync. Still, it avoids the difficult technical issue of extracting metadata directly from a data model. This, unfortunately, fits better with the perception that many data administrators have (or had) of themselves as glorified librarians.
What happens after the metadata repository is implemented? Well, I have seen too many of what I call "roach motel repositories." The metadata checks in, but it never checks out. That is, nobody uses the repositories, either because they cannot or do not know they exist or find they are not helpful. Repositories may be built at great expense and conform to all the supposed best practices and industry theories. However, "build it and they will come" does not work. Sometimes the technical implementation is so insular that you have to go to a data administrator to request an output from the repository. These repositories are not Web enabled, nor do they have good search capabilities. In other places, the existence of the repository is hidden deep in the corporate intranet.