The consensus among industry and financial analysts is that the data warehousing market is mature and that the products and methodologies required to design and implement data warehouses are well understood. The original data warehouses were envisioned as allowing management to use informed hindsight to make better business judgments. However, this vision of the warehouse didn't anticipate that e-business would increase the speed at which the enterprise needs to react. While decisions about how to stock a retail store to better serve the buying patterns of a particular community might affect unsold inventory and profitability, it is unlikely that the inability to make those decisions in a timely manner would drive away customers, as geographic proximity is presumably one of the factors that keeps the customer base "loyal." However, a two- or four-week delay in servicing the needs of an Internet customer is likely to result in the loss of that relationship.

The promise of e-business has been significantly tarnished by its very public failures. However, the fact remains that supporting customers via the Internet has become a requirement for companies, with the bar for service being set by their most effective competitors. Successful use of the Internet as a sales channel depends on the design and range of Internet applications that interface to it ­ and, equally important, the customer's ability to conduct and fulfill transactions through that interface. One could argue that these issues have little to do with traditional data warehousing applications, and yet they are similar to the extent that they typically require the creation and maintenance of data caches distinct from those that are used by back-office applications.

In the following sections, we will examine the types of data and data caches required to conduct e-business and how they relate to data warehouses that are used in decision support. We will also explore the technical and organizational challenges a company must face to support these data integration challenges.

The Data Demands of E- Business

Supply chain. There are two major applications of the Internet ­ supply-chain management, and sales and marketing. From a data- integration point of view, supply-chain applications are somewhat easier to implement because they deal with information that is already well understood by operational systems: bills of materials and inventory. Most manufacturing companies already have product databases that describe the products they manufacture in terms of their components and the subcomponents. Material resource planning (MRP) applications help them balance the level of available inventory with what is required to fill pending orders. To avail itself of supply-chain applications from vendors such as i2 or Manugistics, a company simply needs to cache this information. Of course, that task may require as much effort as many decision-support warehouses.

Consider the case of an electronics component manufacturer with 40 sites worldwide. Odds are that many of these sites will have different MRP applications, and even those that share the same application are likely to represent the data differently. For this reason, and because it is probably unwise to try to directly interface in real time to systems distributed so widely across time and space, such a company is likely to build a data warehouse to serve as the interface to its supply-chain solution.

Customer transactions. Using the Internet as a channel for the generation of sales is a significantly more complex problem. The simplest way is to use the Web as a means of disseminating information about a company's products and services, allowing customers to request that they be contacted about the offerings in which they may be interested. It is significantly more challenging to allow customers to place orders on the Internet because a company's success will depend upon how accurately it interacts with the customer regarding product availability and how satisfying it can make the purchasing experience on the Web.

Many of the B2C companies such as or failed to appreciate the importance of accurate inventory information when they didn't own that information. As a result, customers became extremely dissatisfied when they did not receive products that they were led to believe they had purchased. In fact, was actually sued when it was unable to deliver toys ordered for Christmas. B2C companies used two means of addressing these problems: building physical warehouses to store popular items ­ an approach that significantly increased the cost burden of maintaining a "virtual storefront"; and building "offline inventory" systems that allowed them to have a much more accurate view of the availability of items from the vendors they represented.

Like the data caches used to support supply-chain applications, offline inventory systems involved building traditional data warehouses. This approach satisfied the need to consolidate and integrate data from a variety of sources to provide the e- customer with a consistent view of the data. However, offline inventory systems are different from decision-support warehouses in several ways:

  • They typically have a larger number of source systems because they may represent dozens of vendors.
  • They must support potentially tens of thousands of simultaneous queries, although most queries are relatively simple compared to the various ad hoc queries that might be made of decision- support warehouses.
  • The latency of the data stored in offline inventory systems is relatively short because their goal is to reflect what is true at the present moment. This contrasts with the longer data latency required by decision-support warehouses to provide historical perspectives.

Depending upon the nature of their business, brick-and-mortar companies may be able to use middleware products such as those offered by IBM or TIBCO to provide real-time access to the information required to support customer transactions, thus eliminating the need to create an offline inventory system. For example, banks offer a limited set of well-defined services that require access to limited back-office data, such as customer account or mortgage rates. As a result, they could use message-oriented middleware to enable the desired interaction between the e-customer and the company. On the other hand, a national chain of car dealers that wants to offer customers the ability to purchase vehicles online faces much the same problem as that of the virtual e- tailer described earlier.
Customer retention. While the first step in attracting online customers is the ability to successfully transact business over the Internet, the key to keeping them – just as in businesses that deal in a person-to-person mode (whether physically or telephonically) – is creating a sense of relationship. In the Internet world, this may mean using information from prior transactions to customize the customer's visit. For instance, someone who regularly buys books on medieval history may be alerted upon logon when some new offering in that category is available. While there are a variety of products that help companies achieve this goal, one requirement is that this information be cached in a warehouse used specifically for this type of customization.

Internet marketing data for the decision-support warehouse. Finally, companies will want to understand the Web behavior of their customers, just as they want to understand how the purchasing habits within various branch locations correlate with the demographics of the neighborhood. Information about what customers actually purchase or return can be gleaned from the transactions reflected in updates to the back-end systems. However, it is probably equally valuable to know what the customer looked at and didn't purchase; how often he or she visited a particular page or clicked on a particular item; and where a customer came from – perhaps he or she entered via a banner or maybe from some other advertisement. If 10,000 customers visit a Web page that offers car insurance and only 80 actually apply for a policy, that's a clue that there is either something seriously wrong with the application or the product being offered.

There are essentially two options available for obtaining this information – purchase Web analytic applications such as those offered by Accrue and NetGenesis or extract this information from the clickstream logs available from the third-party Web servers or the commerce servers that host Web sites. Vendors of Web analytic applications build standalone data marts regarding visitors' online behavior, but often require companies to use their own Web development tools in implementing the Web site. As a result, many companies prefer to process server logs directly so they can integrate clickstream information with information from their internal applications to get a consolidated view of their business. In either case, it is likely that the company will want to integrate this information into their decision-support warehouses.

Recognizing a Common Problem

What e- business has revealed is that the technical underpinnings of data warehousing are critical to more than decision support. Almost every aspect of the IT applications supporting e- business requires data caches – data warehouses. The question then becomes: How can the 10-year-old, maturing data warehouse methodologies and products be retargeted to meet these new demands?

The fact that this situation has not been publicly acknowledged is not surprising. Vendors and engineers have a tendency to avoid addressing problems for which there is no elegant solution. As a result, products that attempt to address the complexities inherent in tackling distributed, heterogeneous computing environments typically rely on standards or proprietary APIs that leave the details of data integration to someone else. Historically, standards have been a less than ideal solution for two reasons. First, they typically ignore the problem of semantics. For example, an XML schema may have a field layout defined as "salary." However, unless the time frame and denomination of that data value is also specified (e.g., expressed as a monthly value in Euros), applications that hope to use this standard must still have an implicit understanding of what is expected. Second, while vendors in a "hot" area such as Web services may code against these standards, the only way legacy environments can be integrated is through generic gateway products that do not address the transformation logic required to interface between the native representation of data values and that expected by the standard. The other alternative is hand-coding custom interfaces, a long-standing but error-prone and costly alternative.

Data Integration as an Enterprise Discipline

The other alternative is to recognize that in today's distributed, heterogeneous world, the traditional data warehouse is just another application, and a potentially critical one at that. Like any other application, it depends on a persistent data store that is regularly populated from reliable sources. The challenge in supporting this application has been in the design, population and maintenance of this data store – a task complicated by the fact that it requires extracting, transforming and consolidating data from most of the back-office applications used to run the enterprise. Most e- business applications share this characteristic.

When the economic climate is strong, the inefficiencies resulting from duplication of effort in IT are often overlooked. When times get tight, one of two things happens: Either organizations stop investing in IT, or new priorities emerge. In the current downturn, one of the most important emerging priorities is data integration in the IT infrastructure. What were once enablers, ETL products have become key to IT success. Giga estimates the size of the current data integration market at $620 million, and META Group estimates that the worldwide market for data integration will grow from $900 million in 2001 to $1.3 billion in 2004.

As an application that heavily relies on data integration, the data warehouse can serve as a great teacher in helping companies understand how to arrive at the right mix of product, methodology and organization to address data integration on the enterprise level. Key to the success of these efforts is the identification of where shortcuts were taken and a candid assessment of any resulting inefficiencies in the maintenance cycle. Otherwise, companies may lose an excellent opportunity to develop a cost-effective means of addressing the complexities that underlie their IT strategies.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access