Continue in 2 seconds

XML Standards are an Integral Part of Meta Data

Published
  • August 19 2004, 1:00am EDT

One of the interesting questions that has been asked over the past few years is what role does meta data play in the development of XML? In the past, data architects could ignore the meta data aspects of defining data or, better said, they could let the application tool take care of mundane activities such as meta data. However, with the advent of XML technologies, meta data must be placed on the front end and integrated into every step of the process. In fact, data architects must find ways to exploit meta data and adapt their training to a new world where, perhaps, the definition of data is as important as the data itself. Take a look at Figure 1 and review the five tiers of XML asset creation.


Figure 1: XML Artifacts

Architecture Naming Conventions

Naming standards have been referred to as a "taxonomy" in some of the literature today. In my effort to keep all things as simple as possible, naming standards will reflect how we define the rules of construction. For example, lets review a couple of simple rules.

Language. Wherever possible, tag names will be constructed using American English as the basis for language and spelling.

Term Capitalization. All terms used within the XML tagging structure will begin with a capital letter and all subsequent letters for that term will be in lower case.

Valid: DayOfMonth, HoursWorked, PrincipleBalance
Invalid: DayofMonth, hoursWorked, PrincipleBALANCE

Term Separation. Term separation characters will not be allowed; the capital letter of Term Capitalization will act as a term divider.

Valid: DayOfMonth, HoursWorked, PrincipleBalance
Invalid: Day_Of_Month, Hours Worked, Principle-Balance

Most naming convention documents will contain many more of these but you get the idea. We are stating the basic principles or standards of how XML tags can be created. Naming conventions are simply the foundation of this effort. Knowledge of these principles should extend up and down the organization concerned with the development, reuse or integration of XML based applications including: designers, developers, analysts and testers. These rules should be encoded into the XML schema development application (if there is one) or housed in the architecture document repository.

Terms

Terms sounds like the simplest of the elements in the five-tier program. We want a collection of terms that can be used in conjunction with the rules to build our element inventory. Great, that sounds simple; where is that Webster's dictionary? Hold on, it is not that simple since we may have multiple terms. For example, what if we had an architecture standard as follows:

Consistent Use of Terms, Abbreviations and Stock Symbols. All terms used within the XML construct will be consistent in the utilization of terms, abbreviations and the special use stock symbol within the document. The stock symbol abbreviation is the shortest construct for the term. Example: Term=Balance, Abbreviation=Bal, Stock Symbol=B.

Valid: PrincipleBalance, PrinBal, PB
Invalid: PrincipleBal, PBalance, PBal

A solid glossary will hold the collection of valid spellings, definitions and other construction information and rules.

Elements and Attributes

Elements and attributes are the result of one or more terms being combined in order to define a specific context for data. Examples would include PrincipleBalance and DayOfWeek. Elements will be housed in the data dictionary along with definitions, ownerships, models or other types of meta data. Do you really need a data dictionary since elements and attributes can be defined inside the schema? Not really, applications are available that can scan across multiple schemas looking for specific elements. However, many researchers are focusing on the ability to model x-dimensional data for XML. Perhaps this logical, conceptual and physical modeling concept will provide the link between these two efforts.

Schema

The schema has replaced the use of the document type definition (DTD) with a new XML technology. The DTD is a long-standing standard under the SGML standard and is excellent at describing document structure. However, the DTD falls short in providing the same utility to a data-centric environment. The schema provides the ability to describe rules and constraints for base and derived data, extended data types, facets, value limits, enumeration, occurrences and patterns.

Where are you going to house those schemas? OK, there are only about 20 in use within your organization so it's no big deal. What happens when there are 10,000? What about externally defined schemas? Have you taken a look at the number of publicly available vocabularies out there? There are already hundreds spanning multiple businesses and processes. If your organization is going to use external standards or vocabularies then you are going to need to capture the significant meta data that describes the vocabulary, how it is being used, what business process can benefit from using the standard, etc. An additional element of meta data that is required includes a subject matter expert. Let's say that you decide that you are going to use the OASIS ebXML Business Process standard in an application. It is not only important to capture the specifics of the standard but also identify someone that understands the standard, elements and utility provided. Otherwise, you will be forcing the development community to start from ground zero every time they choose this standard.

Usage

Usage provides the basic value of identifying which organization, applications, programs, message structures are using the schema definition. Here is where your entire repository collection comes into play. Assuming you have been listening over the past few years, you should be able to tie the schema repository to your system, application and interface repository; to your reusable component repository; and to your Web service registry. You may even begin to see the need to supplement your reuse program with another reusable asset, namely the XML schema. You do have a formal reuse program based on the reuse maturity model in your organization? Right?

The goal is to have a repository with the critical information online and readily accessible for every stage of the process. There are several products on the market that handle some but not all of the required levels. Which level is the most important? They are all important depending on your role and stage of development. Recently, I spent some time at an academic conference in Las Vegas. Yes, some where between crying over pocket aces in seven card stud and the elation of the dealer getting a pair of fours to go with my straight, I was able to discuss meta data with both faculty and students. For those of us with an ACM or IEEE subscription, the lack of meta data research is disappointing at best.

The response was a welcome sight in that the 20 or so people that attended seemed to be really interested in where meta data is going and why this should be an important research topic. First, we must address the basic question of why meta data should be an academic research topic and not a professional one. For the most part, meta data research is still stuck in the 1980s in the sense that meta data is just data about data. I proved this, albeit not scientifically, by taking a look at my freshman text on computer science. Oh yes, we all remember those days of card decks, mainframes, COBOL, Assembler, Commodore 64s and meta data or data about data. Today, if you open a computer science text book the world has changed as we see from C Sharp, UML, object-oriented, grid, Java and meta data or data about data. Are we saying that every technology under the computer science umbrella has changed except for meta data? Unfortunately, meta data has lost much of its luster and fascination which has been replaced with XMI, topic maps, ontologies, etc. The basic value and utility of meta data has not changed and therein lies the problem. We need to expand the body of knowledge around this subject and stop expecting the solutions to come from the vendor community. Perhaps the XML construction process and repository relationship is a starting point.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access