Continue in 2 seconds

Meta Data and XML: Will History Repeat Itself?

  • October 16 2003, 1:00am EDT

Last month, we touched on the HTML meta data component called the metatag, and it seems logical to take a look at the XML meta data world this month. However, there is a huge difference between these two technologies. In the HTML world metatags are optional while in the XML world they are the required foundational components of the technology itself. Unfortunately, we don’t have to look to far back to get a clear picture of where we are going.

The relational database model got its start in a series of IBM technical reports and then in a landmark paper, "A Relational Model of Data for Large Shared Data Banks," Edgar Codd laid out a new way to organize and access data. What Codd called the "relational model" rested on two key points: It provides a means of describing data with its natural structure only – that is, without superimposing any additional structure for machine representation purposes. Accordingly, it provides a basis for a high-level data language which will yield maximal independence between programs on the one hand and machine representation on the other. (Codd, 1970).

We as database programmers jumped right in. We didn’t need any stinking models, data dictionaries or any tool to help define data. Inline coding was all the rage back then. I did it; you did it; we all coded applications with very limited reuse or data resource management. Eventually, the databases and applications got too big, and we needed to export the data definitions into copybooks, data definition structures or some other hardware-specific utility. Eventually, the database environment got too complex and we started using models to present the meaning of data and the ER diagram was integrated into the SDLC. After that data stewardship, meta data management, data quality and data resource management programs emerged and integrated themselves in the core data architecture. Ahhhhh, nirvana in the RDBMS had been reached (by someone, I’m sure).

Then comes XML down the technology track. Surely, we are not going to just jump right into this technology and return to those early days of database history. This time we are going to create meta data tag management applications built on the principles of reuse and provide the active utility of schema and DTD validation on the front end. This time, we are going to integrate data quality before it’s loaded into the XML database. This time, we are going define standards and data definitions that can be categorized into understandable business views and business functionality. This time, we are going to implement XML not because it’s the latest and greatest, but because it’s the right business decision.

Yeah, right on! Who’s with me? Hello? Hello? Staff? Staff? Darn, where did everyone go? No, I’ll bet your organization is like mine where programmers, designers, integrators and project managers are jumping all over the XML technology with little or no forethought to the XML data resource management. Don’t be surprised when you take a look at one of your XML files and see the problem shown in Figure 1. I have removed all but four statements from the file. Can you spot the errors?

Figure 1: Sample XML File

Within this one XML document, we found six different spellings for service, four different spellings for order, six different field naming standards, four different data formats as well as 22 fields for which we could not determine the definition or use (17 percent). Just as when we only had a few key databases, the management of the information wasn’t a very big deal. We could mentally handle the semantic understanding and ambiguities because the scope was relatively narrow. What is going to happen when we have a 1,000 of these XML documents sitting around with their umpteen undefined standards: Oh, it won’t this time: we’re too smart; we’re to advanced. XML is an open standard; we have DTD and schemas this time and we will have tools this time. Really, you think so? Where are the tools, where are the standards, where are all those advanced thinkers everyone keeps talking about? They don’t seem to be the ones actually writing the XML code.

Of course, I can’t blame the programming community who is under the siege of out-sourcing, overseas sourcing, budget cuts and time constraints that hardly allow time for taking a look at XML beyond the next three steps. Who among us would stand up in the face of management and say what we need in the XML environment is:

  • Consistent style of tag names,
  • Consistent naming conventions,
  • Consistent tag definition,
  • Managing the XML artifacts for reuse,
  • DTD and schema dynamic validation utilities,
  • Documented code sets,
  • Well-defined business model namespaces,
  • RDF-defined meta data,
  • Front-end topic maps and ontologies.

If you are one of these bold souls, the XML fellowship should have a place you. Be forewarned that you will have an uphill battle with the leadership that thrives on the short- term view and demands short-term results. Eventually, the path of meta data will be paved and acknowledged as a critical component of the enterprise as well as your XML strategy.
A couple of weeks ago, a friend and I were discussing the lack of new and innovative products to hit the market over the past decade. Where are the products that changed our lives such as television, microwave ovens, compact disks or the personal computer. Of course, things have gotten faster and smaller, but I am talking about truly innovative products that make everyone head to the store to buy, buy, buy. Perhaps, XML will be the infrastructure needed to bring on the next wave of innovative products to our door. Maybe the products will be digital in nature, and XML and the meta data held within will enable this next wave of innovation. All I do know, is that meta data was barely on the radar of most major corporations five years ago. Today, meta data is in the forefront the XML technology, and I for one can’t think of a better place to be.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access