If there's any knowledge to be gleaned from the free-fall Internet stocks took last spring, it's that Web sites better not talk too loudly about "potential" in the canyons of Wall Street. For the 12 months preceding the plummet, stock valuations of Web companies had risen to dizzying heights on the premise that one day – some day – these companies would capitalize on the possibilities inherent in the Internet channel. The theory went that compared to traditional retail stores, the relatively low overhead of Web sites would allow them to deliver goods at a lower cost and still maintain a comfortable profit margin. As long as there was strong traffic, the faithful said, things would be fine. Just watch – eventually the masses will embrace the Internet as a viable purchasing channel and profits will soar. All it was going to take was a little time.

Well, when the bell rang to start trading on March 28, time ran out. The general consensus among investors was that with the U.S. online population running to a whopping 110 million individuals, the Internet had grown beyond its adolescence. It was high time to measure these companies by the same standards as traditional businesses. Voluminous traffic figures and snappy functionality, elements by which successful sites were once measured, no longer cut it. It had all boiled down to the most elemental yardstick of all: money. And the mandate was Web sites in the business of turning profits better start doing it. Not next year. Not next month. Now.

All of which explains why some of the large e-tailers have been giving off the scent of serious fear. Where once Internet executives could justify low profits by pointing outward to the immaturity of the medium, investors now are forcing those executives to look within. And what's becoming apparent is that shortfalls in revenue are more aptly attributed to a Web site's own inability to convert browsers into buyers. To compete in the Web space now, sites are discovering that they need to maximize every interaction with every customer by converting browsers in rapid order from first-time visitors to first-time buyers to loyal patrons. Failure to deepen, broaden and enrich individual relationships right from the first meeting raises the possibility that someone else will.

Changing Face of Web Competition

So it's more than a little ironic that high-tech portals and e-commerce sites are looking to personalization, the ace selling tactic of small-time, small-town merchants, to improve customer profitability. And why not? It was the ability to know and adapt to customers as individuals that built or broke the local business. It was this knowledge that allowed store owners to cross-sell, up- sell and develop deep customer loyalties. Of course, the interactions were not labeled quite so starkly. There was nuance and subtlety to the exchanges.

For example, trace the following hypothetical – but plausible – exchange between the owner of a garden store and a first-time customer. When a fresh face walks through the door, the store owner steps from behind the wall of greenery and introduces herself. The two engage in conversation about the weather and the fleeting growing season, while the store owner asks polite, but probing, questions. She's gauging the customer's interests and level of experience with gardening so she can respond in a manner appropriate to the individual. As it turns out, the customer grew up on a farm and, following a successful career in the big city, has retired to a spot just east of this country town, in part to rekindle a childhood passion for dirt. Since the store owner has lived in the area her entire life, she knows the local growing conditions very well. This knowledge, combined with the information recently gleaned from the customer, allows the store owner to suggest not only the right strain of potatoes for the local conditions, but also a self-powered rototiller to ease the labor of preparing seed beds. As the two move to the counter, the store owner casually mentions a sale on lime. Why? Because the soil to the east of town has large deposits of iron, which tend to make the soil fairly acidic.

Would you like to bring a little of this to your Web site? Well, banal as this example may seem, embedded in it are the three pillars necessary for a powerful Web personalization system: integration of disparate data, scalability to accommodate rising data volumes and timely sophisticated analysis. For instance, the store owner reconciled two different data sources – one from the customer and one from the world at large – by having the ability to recognize information about the east side of town. The store owner also was able to accommodate the information from this customer, a seemingly trivial addition that only gains significance in the context of the store's other 5,000 customers. Lastly, and perhaps most importantly, the store owner analyzed the information quickly and effectively. She didn't wait a week to sift through the information and analyze its implications. She didn't ask the customer to come back. She responded immediately to a buyer in the market and in so doing was able to deliver not only the right potatoes, but also up-sell an expensive rototiller and cross-sell a sack of lime.

Unlocking the Potential

The intimacy of relationship is just as feasible on the Web, but here personalization acts upon the models and profiles obtained through analysis and dynamically serves customized content (links, products, recommendations, ads, services, pricing offers, etc.) to individual visitors. However, developing and managing this relationship presents a formidable challenge given the diverse sources of customer information and the staggering volume of both offline and online data; but this personalization also offers great rewards. Each click on a site indicates a choice or preference about a vendor and its products. The ability to collect detailed consumer data at the most basic level of individual mouse clicks and then perform sophisticated analysis of large collections of these individual choices – preferably augmented by transaction and demographic data – provides vendors with the opportunity to know and serve their customers like never before. The Internet offers vendors the opportunity to achieve what brick-and-mortars can only dream of – in effect, the ability to attach a homing device to all store visitors, track their paths through the aisles to determine the displays and promotions that appeal to them, and monitor how they add and remove items from their shopping cart.

While few are unaware of the potential contained in clickstream data, to date that potential has gone unrealized, due largely to the shortcomings of the tools available for Web analysis. Most of the Web reporting tools available today provide only primitive mechanisms for reporting user activity. Basically, they lump similar clicks into bins and construct histograms. While these tools enable vendors to determine the number of requests for certain files, the date and time visitors came to the site and URLs of browsers, they typically provide little analysis of the relationships between the files accessed. This analysis is essential to fully leveraging the data gathered in daily transactions. Some of the more sophisticated reporting tools available today are capable of fairly in-depth analysis. Unfortunately they are not fast enough to enable real-time Web personalization, which places sites in the unenviable position of having to wait to respond to active buyers.

As for the real-time "personalization" tools themselves, their limitation is in shallowness of analysis. They typically limit themselves to collaborative filtering techniques and other primitive, product-centric correlations. As a result, sites can only view their customers in terms of rough categorizations, not with the nuance and subtlety necessary to fully understand and anticipate their needs. This shortcoming does not result from the failure to recognize the importance of performing rich analysis. It is a consequence of technical barriers: namely, the difficulty of performing sophisticated data mining on huge volumes of data in a timely fashion. Most of the collaborative filtering and similar tools actually discard the granular clickstream data containing vital information about user choices and preferences in order to complete processing in the time available. Additionally, a wide gap is rapidly developing between the function delivered by first-generation Web reporting and Web "personalization" tools and the needs of the major Internet players.

The Trifecta of Web Personalization

To implement true Web personalization, Web sites will need to look beyond the current crop of Web reporting tools. Specifically, they will need to employ a system that applies sophisticated, scalable data mining techniques to large volumes of data – both online clickstream or Web usage data, and demographic and transaction data archived offline – and deliver the results of that analysis in real time. The keys to such a system are:

Integration – To fully develop customer profiles, traditional brick-and- mortar warehouses now are adding clickstream data into their stores, while Internet warehouses are integrating transaction and demographic data. As a result of this online and offline merger, the customer data available is becoming increasingly rich. Integrating the disparate sources of this data and mining the aggregate volumes in a timely, orderly fashion is becoming increasingly challenging, but absolutely necessary.

To build a complete model of the user, Web usage mining must compare clickstream with offline data (such as demographic data), archived information on past business interactions with a particular visitor (or visitor's classified group), call-center transactions, purchase history, etc. In practical terms, Web usage mining must handle the integration of offline data with:

  • E-business analytic tools
  • Various e-business RDBMSs
  • E-business catalog of products and services
  • E-business customer service/support departments
  • Best-of-breed applications

Opting to do without this capability results in a fragmented view of the customers that inhibits the ability to cross-sell, up-sell, enhance customer loyalty, convert visitors to customers and other tasks identified as critical to prevailing in the aggressive e-marketplace.
Another major benefit of technology that facilitates integration is that it provides the flexibility to customize applications for competitive edge. The canned reports provided by existing Web reporting tools do not enable major portals, e-commerce sites, content providers and other major Internet players to develop insights that differentiate them from the competition.

Scalability – High-quality modeling, the kind suitable for true personalization, requires large data samples with lower estimation errors and lower variance. This combination, coupled with the pressing demands for prompt analysis, raises the requirement for scalable processing. Why? Most application processes and Web reporting and analytical tools are designed to run on only one processor, regardless of the number of processors available to work in concert on any one problem. The amount of data quickly overwhelms the power of any one processor.

What is required is an environment that enables programmers to develop Web usage mining systems that run in parallel to handle the enormous volumes of data in the time frame required. This environment must remove the complexities of parallel programming from the development process, easing the construction of in-house systems that offer unique competitive edge. In addition, this environment must manage the parallel execution of these complex systems against massive data volumes, the partitioning and re-partitioning of data across myriad application processes, the replication of application logic, parallel communication with the RDBMS and parallel scalable analysis. These capabilities which have proved indispensable in the data warehousing arena are no less critical to the survival of e-businesses.

Sophisticated Web Usage Analysis – Even as the size of the Internet grows exponentially, the cost of storing data has fallen precipitously. Today, businesses save data they previously might have discarded; and as it became increasingly cost-effective to do so, e-merchants started to save all the clickstreams from their sites at the granularity of individual clicks. As a consequence of the growth of Web traffic archived data, today's e-merchants sit on top of a gold mine – the accumulated data deposited by their online visitors. This gold mine, however, remains largely unexploited because, until now, the technology has not existed that offers both in-depth analysis of Web usage data and the ability to handle massive volumes
of clickstream and offline data in a timely fashion.

Web usage mining strives to make sense of this information by analyzing the clickstream information collected in Web server logs, referrer logs, user registration forms and purchase data. Web usage mining helps e-marketers determine the lifetime value of customers, cross-market products, up-sell, rate the effectiveness of promotions and convert visitors to customers by tailoring Web content to individual visitors.

Indivisible Triad

Perhaps most importantly, it is not enough for Web personalization systems to have only one or two of these key features because the functionality of each depends on the other two. For example, integrating the large amount of data archived in data warehouses and data marts would be exceedingly cumbersome if the personalization system didn't scale easily to accommodate rising data volumes. The same dependency exists between sophisticated analysis and scalability. The most sophisticated analytic models are often the most time-consuming to train, a characteristic that has limited their applicability. However, parallel training facilitates the use of these powerful nonlinear models in situations in which they would otherwise be infeasible. The need for parallel processing is exacerbated by the fact that analytic modeling is not only CPU-hungry, but in practice it also must be repeated using multiple combinations of methods and algorithms. And lastly, there is the stong connection between integration and analytic depth. In many important Web reporting and personalization situations, even a comprehensive panoply of advanced data mining methods does not suffice. These methods must be supplemented by the more traditional approaches such as SQL queries and multidimensional analysis. The difficulty is these traditional approaches demand access to offline data, usually from large data warehouses.

Will large Web sites ever be able to understand and predict the needs of their consumers? Perhaps. In many cases, the essential foundations for such customer relationships – the warehouses storing the accumulated mountains of clickstream and transaction data – are already in place. What's needed now is a system that can make use of that data in a manner relevant for Web use. To do that, a system will need to be able to integrate disparate sources of data, scale to meet rising data volumes and analyze the aggregate volumes with speed and sophistication. Until that happens, everything else is untapped potential.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access