Improve Your XML Applications

Register now

XML is everywhere. It is unrivalled both as a data exchange format for industry standards and a data interchange format for application developers. It’s taken just 10 years for XML to reach ubiquity.


XML could even be considered the new language of business data. Most industries are developing standards for data interchange in XML. And some XML standards, like XBRL, are transcending across all industries. XML has experienced rapid rates of adoption because it solves some big issues with exchanging information, including:


  • It is platform independent, which means that any two computer systems, regardless of the hardware or software that the system uses, can utilize XML to exchange information.
  • It is self-documenting, allowing people to understand the structure of the data simply by reading the XML.
  • The hierarchical nature of the XML data format can be easily used to represent common data structures like records, trees, lists and more.

During the past decade, many organizations developed software applications that store or retrieve XML data. For the most part, these applications used existing infrastructure, including the file system and relational databases, to store the XML. They used workarounds that incurred significant processing overhead and required expensive transformations. This had a significant impact on the complexity and performance of the applications.


Initially, a new breed of XML-only databases emerged to address the needs of these applications. However, XML data is isolated in these databases - it lives in a different repository than other data. In this era of business optimization, organizations are increasingly looking to optimize the use of information assets. Placing XML data in an isolated XML-only database hinders business optimization efforts. The major relational database vendors addressed this issue by building native XML support into the database environments. By doing so, they have shortened development time, lowered maintenance costs and improved application performance when storing and retrieving XML data.


As more business information is cast in XML, the ability to efficiently and effectively handle XML data in a transactional environment is essential.

Transactional environments put a premium on programming ease, high performance, high availability and cost effectiveness. Providing these characteristics to XML-oriented applications requires unique innovation.

But that hasn’t always been the case.


The Old XML Ways


Clumsy and complicated are two words often used to describe the way developers first handled XML data. Many companies were already using relational databases when XML came along. A relational database stores data in a tabular format with columns for each piece of information and then rows for each record. With the development of XML, professionals wanted to figure out how to store this information in a database. The question became, “How can we get XML data into a tabular format for efficiency?” But the problem is that XML data is inherently hierarchical. In other words, it is more apt for representation using a tree structure rather than a tabular one.


Programmers responded by creating two workarounds. One method called shredding takes XML and tries to map it into a tabular format. A second approach called stuffing puts all the XML data into a single large object (LOB) cell in the table. Both approaches work, but there are significant drawbacks, especially as the amount of XML data has grown.


With LOB storage, a database cannot natively work with the information in the cell. A developer has to retrieve the large object and create code to process it. As a result, there is a performance impact every time you need to work with the information in the database.


Many native XML databases, however, bypass these workarounds because they work directly with the XML elements. In other words, the XML is not simply a string of characters or a large object. Each native XML database is implemented differently, and you should clarify the specific capabilities of the native XML repository you plan to work with in this regard.


For a long time, shredding and LOB storage were the only options available, which explains why both of these workarounds are still very common. But decision-makers need to be aware of the ways database vendors have upgraded their products. The major vendors all handle XML data natively, with each doing so in its own way.


The Benefits of Using Native XML


There are many reasons why XML data should be stored in its native form. Using native XML creates a simplified IT environment that’s easier to manage, improves database administrator and application developer productivity, provides significant storage savings and improves application performance. Don’t overlook these benefits.


Native XML Storage Creates a Simplified IT Environment that’s Easier to Manage


Shredding is a popular approach when quickly retrieving individual pieces of information from the database is important. However, this fast query performance comes at a cost. That cost involves both the effort required to map the XML data into a tabular format and the processing overhead associated with shredding when inserting information into the database. This cost is then accentuated if you later want to recreate the original XML data from the shredded fields - a process that is often called re-composition.


Before shredding, you need to design a relational schema for the data. This can quite often be a labor-intensive process. Sometimes this can be automated with off-the-shelf tools. However, you should keep in mind that the resulting tables will almost certainly need to be carefully examined and possibly optimized. After designing the relational schema, you then need to set up the environment that actually maps the XML to the relational schema. And finally, you will need to develop and test code for using the data, which is typically quite complex because of the need for unwieldy SQL statements with multiple JOIN statements.


It sounds like there is a significant overhead for shredding XML data into a relational schema, but this is only a part of the story. You also need to consider what happens when the XML schema changes. And XML schema changes are an unfortunate reality for many of us. When the schema changes, it can play havoc with your relational schema, your mapping process and the code for your applications that use the data. Dealing with these types of updates are when many organizations realize the greatest gains from adopting native XML storage.


If you want a real-world example of the potential impact of dealing with relational tables for XML data, consider the FpML (financial products markup language) industry standard. With native XML storage, dealing with FpML messages is as straightforward as storing and retrieving the message. However, if you use shredding to store FpML messages, in some implementations, you need to work with more than 475 separate database tables. Maintaining 475 tables can require significant complexities compared to managing just one.


Native XML Storage Improves IT Productivity


One of the great advantages of working with a native XML repository is that you don’t need to do anything to the information before storing it and you don’t need to do anything to the information when retrieving it. You simply store the XML directly in the repository and retrieve the XML from the same location. This simplified way of working with XML data reduces the amount of time needed for many common tasks associated with storing, maintaining or retrieving XML data.


Native XML Improves Information Integration


Integrating information among various business units and locations can be frustrating. If a company can build a flexible, automated and scalable platform for information reporting, the process becomes much easier to manage. And storing XML data natively makes it more powerful.


XML can be used to create flexible data analysis and reporting systems that are capable of managing data from diverse facilities. By storing the report data from a company’s different units in native XML format, the setup accommodates various schemas and report formats while making the data accessible to the entire organization. This can improve communication between business and IT staffs. Native XML storage makes it easy to add, update or even delete reported items. Building a better information reporting system gives a company greater business insight and agility. It can also reduce cost and labor for implementing application changes and improve responses to changes in regulatory and management reporting requirements.


Don’t Overlook XML Schema Flexibility


One additional area to consider when evaluating XML pertains to XML schemas, which define the structure of XML data. For example, an XML schema describes the XML elements and attributes that can appear in XML data and their data. It also describes where they can appear and how often. You can consider an XML schema to be a set of rules for the XML elements and attributes that appear in your XML data. This is often used to define an agreed-upon vocabulary of XML tags for a specific application scenario, such as financial trading, medical records or insurance claims.


Often when working with XML data, you will want to ensure that the XML data adheres to the rules of an XML schema. This is frequently referred to as checking that XML data is valid or validating the XML data. Validating XML data is a good idea in many environments because it ensures that you will not encounter issues when working with the data.


When working with a native XML database management system, there are a couple of aspects of XML schemas that you especially need to keep in mind: schema flexibility and schema evolution. Schema flexibility refers to the ability to cater to a wide range of XML schema needs. Schema evolution refers to the ability to handle new versions of your XML schemas. To explain the need for these features, let's look at a couple of scenarios.

Consider the situation encountered by tax authorities who store information from tax forms in XML format. There is, at the very least, the possibility of minor changes to the tax forms each year. This means that there are also minor changes to the XML schema each year. In such situations, you do not want to simply start using the new schema for all records. Instead, you want to validate records against the schema that was in existence when that record was created. In other words, you need the flexibility of being able to have the cells in a database column validate against different schemas (or possibly not validate against a schema). This is called schema flexibility.


Now consider a situation where an organization stores messages that adhere to one of the major XML standards, like HL7 or FpML. Industry standards are evolving, with new versions of those standards made available over time. Moving to a new version of an XML standard usually means also moving to a new - and hopefully compatible - XML schema.


If you migrate to a new version of an XML standard, you will want a database management system that supports the ability to move to this new XML schema without having to revalidate or change your entire existing XML document. For instance, if you want to have your data adhere to the new XML standard, then you want your database management system to ensure that existing data (that was originally validated against an XML schema for an older version of the standard) adheres to the new XML schema. This is called compatible schema evolution. (Note: Incompatible schema evolution is also possible.)


All of this is probable because of native XML storage, which preserves everything that developers love about XML. There are many reasons for a company to pursue a new XML storage strategy and few to use the existing workarounds. The increases in XML data every day necessitate companies take action now to lower costs and optimize their business processes.

For reprint and licensing requests for this article, click here.