Over the past 10 years, data warehousing has proven to be a highly valuable technology that the vast majority of corporations have leveraged to provide them with a competitive edge in the marketplace. As we enter the next decade, extensible markup language (XML) is poised to accomplish much the same. The one unanswered question is how will these two essential technologies function together.
Virtually all Web sites have been built with hypertext markup language (HTML), which describes how data will be formatted but does not provide information on this data. Consequently, this unstructured Web-site data is very difficult to bring into a data warehouse system. XML provides a remedy to this situation by assigning data tags to this Web-site information. To understand how these data tags function let's use XML to describe the information about a textbook:
Figure 1: XML Bringing Data into the Data Warehouse
Figure 1 illustrates data being read from a physician's Web site and brought into a XML transformation process (see Figure 1, bullet 1). This transformation process (bullet 3) matches the Web-site data to the corresponding XML schema (data tag layout). Remember that one of the key challenges for XML is to standardize on the names and meaning of the data tags. As an industry, IT has had limited success in defining global standards, and I don't expect XML to change this trend. Therefore, we will have to juggle multiple XML schemas in our corporations. Next, the XML transformation process converts the tagged Web-site data into record format by removing the XML data tags which is important since these tags increase processing overhead. These records are sent to the extraction, transformation and load (ETL) process of the data warehouse (bullet 4). The ETL process will clean, integrate and load this data into the data warehouse and its corresponding data marts (bullet 5). Keep in mind that as several ETL tool vendors are looking to expand their current toolsets to include XML transformation functionality. This XML transformation process (bullet 3) could be completely merged into the ETL process.
Often times when we think of the Internet we think about business-to- customer (B2C) transactions; however, the potential for business-to-business (B2B) commerce on the Internet is far greater than that of B2C. Many companies are in the business of selling information. XML plays a major role in this effort as it allows B2B transactions to be brought directly into a data warehouse. Figure 1, bullet 2 shows how the B2B trading partner sends information into the XML transformation process. As before, not all B2B trading partners will use the standard XML schemas so multiple XML schemas will need to be maintained. This process (bullet 3) uses the XML schemas stored in the XML database and moves these converted transactions into the ETL process of the data warehouse (bullet 4). The ETL process then integrates this information into the data warehouse and its data marts (bullet 5).
As we can see, XML is critical technology and it is coming to a data warehouse near you!
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access