Extensible markup language (XML) is a language that allows organizations to communicate internally and externally in a standard way. True? Well, not exactly. XML is more of a syntax than a language. The XML specification defines such things as the use of angle brackets and matching start/end tags. However, unlike HTML, which is a defined language consisting of words such as table, image and font, XML does not come with a specific vocabulary. For this reason, XML is sometimes described as a meta language or a language without words.

XML Dialects

Consider the simple example of a mailing address (see Figure 1). Examples 1 and 2 are both valid XML and represent the same information, but they are not the same. The vocabulary used is different. Example 1 uses ZIP while Example 2 uses postal code. There is different capitalization and use of special characters in the tags. The structure is also different. The type of address (shipping) is reflected in the data (as an attribute value SHIPTO) in Example 1, but is indicated in the tag (shipping-address) in Example 2. Lastly, the format of the data itself is different. Example 1 uses two-letter abbreviations, and Example 2 contains the name of the state.

Figure 1: XML Mailing Address Examples

Document type definitions (DTDs) or schemas are the dictionary for any XML dialect. They define the tags that can be used, in what order they can be used and any constraints on the data, such as data types and valid value lists. Each of these examples, therefore, would be defined by a different DTD. XML documents can be validated against a particular DTD or schema using available tools.

While a human being can read either document and understand it, an application that "speaks" the dialect in Example 1 would not understand Example 2. These differences, which seem small in the simple case of a mailing address, become significant for entire purchase orders, invoices and other business documents.

Dialect Proliferation

At first glance, it may seem that it is simply a matter of two organizations or applications agreeing on a dialect of XML, writing a schema to describe it and using it. In the case where XML is used for a tight integration between two internal applications, this is fine. However, XML is most often used for exchanging information with other organizations as in a B2B exchange. It is unlikely that one XML dialect will meet the needs of all of these organizations for all purposes. It is a more common scenario that companies implementing XML will be confronted with a proliferation of dialects from various sources such as:

  • Standards Organizations and Industry Groups. Many XML dialects are being developed by industry organizations in an attempt to standardize the industry on a specific dialect in order to promote interoperability between companies. Many of these standards are already being used in e-commerce portals or other applications.
  • Trading Partners. Your customers and suppliers may already be using XML and expect to receive XML documents in a specific dialect.
  • Technology Vendors. You will undoubtedly want to include data from your existing systems in your e-commerce transactions. Most databases and enterprise resource planning (ERP) packages have a mechanism for generating XML documents from the data. However, you often have little control over what dialect of XML document these packages generate. Most likely, it will look a lot like either the table layout or the ERP's data model.
  • In-House Projects. You may already be developing XML dialects within your organization. Multiple projects may be creating different (possibly redundant or incompatible) dialects depending on their specific needs.

Business As Usual

The fact that there are multiple ways to describe the same information is nothing new. Ask any two people to define a structure for a mailing address and you will get two different definitions. Within your organization, you probably already have dozens of different structures for a mailing address. It is very difficult to define a standard model that will apply to an entire company, let alone multiple companies.

There are many reasons for this. Business units have different perspectives on information, companies have diverse business models and industries use different terminology. There have been few successful efforts to standardize multiple companies on a single business model in the past. ERP packages that have predefined business models have often forced organizations to either change their business practices or perform heavy customization to the software. Electronic data interchange (EDI) has been the most successful in getting large organizations to agree on a standard, but even EDI required compromises and is often customized.

All of these dialects, therefore, have their place. Rather the trying to fight the proliferation of XML dialects, you should try to manage it successfully using the following recommendations:

  • Use standard dialects when possible. The standard dialects developed by industry groups and standards organizations are a useful place to start when creating XML dialects. Generally, they are carefully developed and documented by XML experts. They may take into account aspects of the design that you have not considered, such as internationalization or future extensibility. Using a standard dialect will decrease the number of dialects in use in your organization, and increase the likelihood that your systems will be interoperable in the future. Even if these dialects do not exactly meet your needs, you can use the extensibility of XML to develop a new dialect that is a superset or subset of the standard. In addition, it is worthwhile to get involved in the organization that is developing these XML dialects to ensure that the dialects will meet your needs and match your business more closely.
  • Leverage your investment in EDI. If you are working for a medium to large company, you are probably already using EDI and have expended significant effort to make your existing applications interface with one of the flavors of the EDI standard. XML is an inexpensive way to extend your EDI processes to a larger number of trading partners who may not currently be using EDI. However, it is only inexpensive if you don't reinvent the wheel. The EDI standards already cover the majority of the data you need to share with your business partners. Take advantage of this wealth of standardized, well-documented data elements. Consider using one of the available tools to translate your EDI to an XML dialect that maps directly to EDI standards, such as XEDI.
  • Use flexible tools to translate between dialects. Translation tools to convert XML documents between dialects are essential. While you can translate XML by writing extensible markup language transformation (XSLT) style sheets or program code by hand, you are much better off with a flexible tool that can make the process easier to develop and maintain, and less prone to error. Mapping tools exist that understand the meta data of XML (DTDs and schemas) and allow a relatively nontechnical user to map two dialects based on this meta data. A good translation tool will keep a repository of the meta data and dialect mappings, allowing them to be queried and cross referenced. This approach allows translation scripts to be modified and regenerated as the dialects inevitably change over time. Additionally, some of these translation tools allow you to manage trading partners in such a way that XML documents can be automatically translated to different dialects depending on their destination company.
  • Practice meta data management. XML documents are information assets, just like databases and logical models. As such, they require proper meta data management. It is important to understand the XML documents which are in use in your organization and be able to assess the impact of a change to one of these documents. Keeping a central repository of XML DTDs and schemas is essential. Centralized schema management allows you to provide documentation of the XML dialects, enforce corporate standards for in-house developed dialects and encourage reuse of schemas. A schema repository also provides examples of well- designed schemas, which is important in a technology where many of the people writing schemas are new to XML. Well-designed schemas are easier to understand, and easier to reuse and extend in the future.

What's so great about XML?

You may be thinking, if I still have to do all of this mapping and translation, why is XML so great? The benefits of having a standard syntax should not be underestimated. Consider the alternative: without XML to interface between multiple applications, you have a hodge-podge of obscure file formats and programming APIs. As XML has become an accepted technology, a mind-boggling number of tools have been developed to parse, validate, compress, query, display, link, translate and edit XML. The best part is that most of these tools are meta-data-aware. Given these benefits, it is hard to imagine that XML was not adopted sooner!

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access