Over the past 10 years, data warehousing has proven to be a highly valuable technology that the vast majority of corporations have leveraged to provide them with a competitive edge in the marketplace. As we enter the next decade, extensible markup language (XML) is poised to accomplish much the same. The one unanswered question is how will these two essential technologies function together.

Virtually all Web sites have been built with hypertext markup language (HTML), which describes how data will be formatted but does not provide information on this data. Consequently, this unstructured Web-site data is very difficult to bring into a data warehouse system. XML provides a remedy to this situation by assigning data tags to this Web-site information. To understand how these data tags function let's use XML to describe the information about a textbook:

Building and Managing the Meta Data Repository

David Marco

John Wiley & Sons
New York

By adding context to the content on a Web site, XML enables corporations to bring unstructured, Web-site data into their data warehouses. This is critical for many companies' analysts who need this information to make better decisions. Let's walk through an example using a healthcare company. Many doctors that research drugs will publish their results to their Web sites. Often the decision-makers in these healthcare organizations want to know about the latest developments with this drug research in order to make better patient- care decisions. To see how XML simplifies this challenge, we will examine Figure 1.

Figure 1: XML Bringing Data into the Data Warehouse

Figure 1 illustrates data being read from a physician's Web site and brought into a XML transformation process (see Figure 1, bullet 1). This transformation process (bullet 3) matches the Web-site data to the corresponding XML schema (data tag layout). Remember that one of the key challenges for XML is to standardize on the names and meaning of the data tags. As an industry, IT has had limited success in defining global standards, and I don't expect XML to change this trend. Therefore, we will have to juggle multiple XML schemas in our corporations. Next, the XML transformation process converts the tagged Web-site data into record format by removing the XML data tags which is important since these tags increase processing overhead. These records are sent to the extraction, transformation and load (ETL) process of the data warehouse (bullet 4). The ETL process will clean, integrate and load this data into the data warehouse and its corresponding data marts (bullet 5). Keep in mind that as several ETL tool vendors are looking to expand their current toolsets to include XML transformation functionality. This XML transformation process (bullet 3) could be completely merged into the ETL process.

Often times when we think of the Internet we think about business-to- customer (B2C) transactions; however, the potential for business-to-business (B2B) commerce on the Internet is far greater than that of B2C. Many companies are in the business of selling information. XML plays a major role in this effort as it allows B2B transactions to be brought directly into a data warehouse. Figure 1, bullet 2 shows how the B2B trading partner sends information into the XML transformation process. As before, not all B2B trading partners will use the standard XML schemas so multiple XML schemas will need to be maintained. This process (bullet 3) uses the XML schemas stored in the XML database and moves these converted transactions into the ETL process of the data warehouse (bullet 4). The ETL process then integrates this information into the data warehouse and its data marts (bullet 5).

As we can see, XML is critical technology and it is coming to a data warehouse near you!

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access