REVIEWER: C.J. Venkataraman, senior software architect at Corbis Corporation.
BACKGROUND: Corbis Corporation is a provider of digital images for consumers and creative professionals using the Web to do business with a growing number of online shoppers as well as the world's most popular publications. Every day we experience nearly half a million visits to our Web site during which customers browse through extensive online art galleries, download pictures, order framed prints or license specific images for repeated use.
PLATFORMS: Our platform consists of dual proxy Compaq Proliant servers with 1.5GB of memory and Pentium III 600 processors running Windows 2000 and the Microsoft SQL Server database engine. A key function of this architecture is to facilitate Web log processing.
PROBLEM SOLVED: We have 26 Web log servers at Corbis, and we copy each server's log files onto one huge server. Each log file we receive swells to at least 200MB in size before it is merged with the other Web logs. Basically, we merge them all, filter the resulting file to include only the customer and Web site information that we want to keep for analysis, and then compress it for storage. Once it is stored and analyzed, we can more clearly see how many visits occur to our Web site, what our customers are doing on the site and the top domains that our visitors are coming from. The resulting merge can easily exceed 5GB daily, considerably slowing down the filtering and compression process. End to end, it was taking close to five hours a day to complete. After receiving and implementing a demonstration version of SyncSort, we found that it brought the daily routine down to one hour, which saves us about four hours on average per day. The speed at which SyncSort merged and sorted our log files was impressive.
PRODUCT FUNCTIONALITY: Performance was our main goal with SyncSort, and now we're experiencing significant time savings of almost 80 percent with the same amount of data as we had before. We've been using it for several months, and we process all our daily Web logs right through it. We have approximately 30 Web log files every day. We just process it all, and use SyncSort to create one flat file that we can load into our databases. In addition to the strides made in Web log processing, we are developing ways to refine the collected data for maximum advantage. We have many data quality projects that will be starting soon, and we're looking forward to using SyncSort to run data quality checks in our warehouses.
STRENGTHS: We get excellent performance from SyncSort, and it's easy to use and customize. I was surprised how it could copy, merge and sort so many huge log files with additional filtering added in and still process so quickly.
WEAKNESSES: We had one problem where SyncSort used to hang during one of our processes. We explained the problem to Syncsort, and they had a fix for us. They then followed up on a regular basis.
SELECTION CRITERIA: I turned to a business associate from Microsoft's iDSS group, who suggested that we try SyncSort. When they were building an integrated data warehouse, iDSS incorporated SyncSort for several steps in their Web log processing and were able to turn a billion records of raw Web data into 500MB of clean data to upload into their warehouse. Hearing about this, we were intrigued. We received and implemented a demonstration version of SyncSort. In fact, we did not explore other products after testing SyncSort because it met the key criteria we set.
DELIVERABLES: Using SyncSort, we are able to identify the crucial information in our Web logs, isolate it and prepare it for warehousing and analysis within a short time frame. In addition to using these Web logs for customer analysis, we are able to evaluate our partnerships with Yahoo! and AltaVista. Our clickstream data provides insight into who has been sending us the most Web traffic. Managers and analysts at our company are also able to use the Web log information to research how their customers think.
VENDOR SUPPORT: When we were in the evaluation period, we had a lot of questions that Syncsort's support team helped us answer. They followed up on a regular basis, promptly and helpfully. The technical support made our decision to incorporate SyncSort easier.
DOCUMENTATION: The documentation was very good and met all our needs.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access