Most of you probably recognize the phrase, "If you build it ... they will come," from the 1989 hit movie "Field of Dreams," in which Iowa farmer Ray Kinsella plows under his cornfield and turns it into a baseball diamond so that the ghost of Shoeless Joe Jackson can once again play ball. The "voice" which speaks the phrase throughout the movie is proven correct at the end, as can be seen by the long line of spectators' cars pulling up to the property.

If we build it ... how will we know that they've come, and how can we use the data from their visit to improve sales next month or calculate the return on investment of our latest marketing campaign? More than a decade later, this is a typical question voiced by many e-business sales and marketing executives who want to decipher the Web traffic or clickstream data being captured by their Web servers and use it to better understand and improve the online experience of their customers. But determining how many visitors have come to your Web site is not as simple as counting the cars as they pull up to your field of dreams. Effectively using the data is even more complex and will most likely require some level of data warehousing and business intelligence.

E-business is simply doing business electronically, which in most cases today means doing business on the Internet. The two main types of e-business are business-to-consumer (B2C) and business-to-business (B2B). A third, and less talked about, type of e-business is consumer-to-consumer (C2C). There are many technical and process challenges involved in establishing a successful e- business. Once the supply chain infrastructure and e-commerce-enabled Web sites are properly designed and built, a whole new set of challenges surfaces for the e-business executive who wants to understand and benefit from the wealth of data being collected by the Web servers which run the Web site. We will explore some of these challenges using a fictitious business-to-consumer company called GreatVideos.com, whose sales and marketing manager happens to be named Ray Cansellit.

Management Challenges

Ray Cansellit knew that the ability to sell videos over the Internet to virtually anyone anywhere could significantly increase sales revenue without the tremendous costs associated with opening up physical stores and hiring the necessary staff. However, Ray also knew that simply building an e-commerce Web site didn't mean that customers would come ­ he would have to advertise the Web site through various Internet marketing strategies such as banner ads and affiliate agreements with partner Web sites and monitor the success of such strategies. He would also have to continually modify the content on the Web site to ensure the best possible visitor experience, which would increase the chances of the visitor becoming a customer. In order to do any of this, Ray would need a lot of information about the end-user's experiences and behaviors on the GreatVideos.com Web site. Ray overheard the company's Web master, Shoeless Bob, talking about the large volume of data being collected by the Web server, but he needed to know what data was available to him and how he could use it to answer the questions that would make him successful in his role.

His list of questions included: How many visitors come to our Web site each day? Each week? Each month? At what times of the day do we get the most traffic? How long does the average visitor stay on our site? What Web sites refer the most traffic to GreatVideos.com? How often do visitors come to our Web site, fill a shopping cart with videos but leave without buying them? What was the return on investment of the April banner ad marketing campaign in terms of new customers and increase in sales revenue?

Ray went to visit Web master Shoeless Bob to learn more about the data being collected at the Web server. The first question Ray had for Bob was whether he could get an accurate count of daily, weekly and monthly hits to the company's home page to analyze and report on traffic. When Bob explained that counting hits would be of little value to Ray because one visitor coming to the GreatVideos.com home page actually generates eleven hits to the Web server, Ray realized that he had better use this first meeting for a little education on basic Web statistics terminology:

Hits - the individual requests that a server answers in order to generate a single Web page. All of the various images on the page, any embedded media files and the page document itself each represent a separate hit to the Web server. Counting hits to gauge traffic levels can greatly overstate the true number of visitors.

Page View - a single screen of content viewed by a user in a browser window. A single page view includes the graphics and documents served to one or many frames on a page displayed by the browser. Counting page views rather than hits typically provides a more realistic count of visitors.

Impression - an actual ad on a Web page that is viewed (served). If a Web page has two ads on it, then a visit to that page accounts for two impressions. Although it is impossible to tell whether or not the user really sees or reads the ad, just opening the page containing the ads generates the impressions.

Visit - a session at a Web site begins when the user first enters the site and ends when they leave. Measuring the number of visits can be tricky, however, as a user might enter your site and begin clicking around, then go to lunch leaving their browser on your site, then continue clicking around your site when they return from lunch an hour later. Is this one visit or two? Most would argue that this is one visit; but according to the Internet Advertising Bureau (IAB), this would be counted as two visits, as there was a period of 30 minutes of inactivity between clicks. Another challenge in counting visits is when a user enters your site, clicks on a link which takes them away from your site for a few minutes, but then returns to your site. This will be counted as two visits, even though it might make more sense to count it as only one.

Visitor -a unique individual who visits your Web site. This is one of the hardest statistics to collect and reliably report. Until computers take retina scans of the individual at the keyboard, there is no guarantee that the visitor at your site is who you think they are ­ even if they have logged in with an ID and password as it might be shared among multiple people. The primary problem in identifying unique visitors is that Web logs only receive the Internet protocol (IP) address of the user visiting the site, unless there is a login process or cookies are used. The problem with IP addresses is that corporate firewalls and Internet service providers (ISPs) may allow multiple users to share a single IP address or may assign the same user a different IP address every time they connect to the Web.

Cookie - information stored on an end-user's hard disk by a Web site so that it can remember something about the user each time the user revisits the Web site.

Ray scheduled a second meeting with Bob to find out just what data was stored in the Web server log files. Based on this round of discussions, Ray discovered that there are only a handful of basic data elements that are stored in the standard common log format (CLF) log file each time a visitor comes to the site. These items include:

  • User Host ­ The IP address (or machine name if available) of the computer connecting to the Web site.
  • Identification or Login ­ The user's remote login, if available.
  • Authuser ­ The user's login name if accessing a site which requires user authentication.
  • Date and Time Stamp ­ The date and time that the request was made.
  • Request ­ The transaction requested by the user, as well as the path name to the information resource requested. This is usually "GET path/filename."
  • Status ­ This is the status code (or error code) of the request.
  • Bytes ­ The size in number of bytes of the document that was requested.

While this data could provide some basic information such as numbers of page views by time period, it wouldn't help with questions regarding what other sites were referring traffic to GreatVideos.com. When Ray mentioned this to Bob, he learned that in addition to these seven basic data elements, the Web server could be configured to include several additional fields by using an extended common log format (ECLF). Additional fields that could be captured include:

  • Referrer ­ This field can provide the URL of the referring site if the user came via a link or banner ad or the name of the search engine and keyword used if the user came via an Internet search.
  • User Agent ­ This field provides the name of the browser used, the browser version number and the user's machine type.
  • Cookie ­ This is the returning or new visitor tag, which aids in the identification of users.

Ray now knew that he could get data about which Web sites were referring traffic to GreatVideos.com, but how was he going to access this data for reporting purposes when it was being stored in log files? How was he going to answer the more complex questions regarding ROI of marketing campaigns, which would require the combination of the Web log data with accounting and sales force automation data in use by GreatVideos.com?
It became apparent to Ray that reporting and analysis of the clickstream data being captured by the Web server was still a few steps away. He learned from Bob that the data in the log files would have to be accessed with an extraction, transformation and load (ETL) tool, and moved into one or more dimensional database structures called a data warehouse. The data warehouse would be designed through data modeling techniques to include various fact and dimension tables necessary to hold relevant data as well as aggregate certain data by different dimensions. Ray also learned that it was in the data warehouse that data from other databases which support the accounting and sales force automation systems could be combined with the clickstream data to satisfy some of Ray's more complex reporting requirements.

This sounded very confusing and technically complex to Ray, and he was about ready to give up the quest for his "field of Web dreams" reporting environment, when Bob explained the purpose of business intelligence tools. Once the back-end technical environment was properly designed and constructed based on Ray's reporting requirements, a business intelligence tool could be implemented, which would allow Ray to access the data to perform his own ad hoc queries and build his own reports without needing to understand the underlying database structure. With a big smile on his face, Ray thanked Bob for all of his help and headed back to his office to prepare a business case for senior management explaining the need for a data warehouse reporting solution.

There are many different challenges involved in designing and building a great e-business/e- commerce Web site. The challenge commonly faced by many e-business managers is simply understanding the wealth of visitor and customer behavior data which can be collected by the Web site and made available through data warehousing and business intelligence technologies. The first steps to overcoming this challenge include: a) knowing what metrics drive your business, which in turn will determine your information access and reporting requirements; b) understanding the basics of Web statistics terminology and what data is available to you through the Web server logs; and c) understanding the basics of data warehousing and business intelligence concepts so you can discuss appropriate information access and reporting strategies with senior management, your IT department and your external IT consulting firm.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access