CATEGORY: Data Acquisition, Transformation and Replication

REVIEWED BY: Dave Shuman, executive vice president, Management Information Systems, for

BACKGROUND: is the leading Web-based subscription provider, offering subscriptions to more than 100,000 magazines, newsletters and newspapers from over 400 publishers, including every major U.S. magazine publisher. Major investors include venture capital firm Madison Dearborn Capital Partners, as well as strategic partners, Time Inc. and Hachette Filipacchi Magazines, Inc. Regularly ranked by MediaMetrix as one of the Top 50 shopping sites on the Web, has also been ranked as the 13th largest e-commerce site on the Web and a "Top 100 Web Site" by PC Data.

PLATFORMS: Data Junction is running on Windows NT 4.0 and Windows 2000.

PROBLEM SOLVED: With thousands of marketing partners and relationships with numerous vendors, we needed a data integration solution that would support our B2B and business intelligence requirements. We needed to manage business-critical data in a wide variety of structures from ASCII text reports to semi-structured ASCII, Sybase, Oracle, Microsoft SQL Server, Web logs and XML. Data Junction and Content Extractor (Cambio) have enabled us to do that and have become an integral part of our e-business infrastructure. In our OrderLoader system, we needed to swiftly implement new partners and integrate the order data from those partners into a SQL relational database. Prior to using Data Junction, we wrote Perl scripts which required much longer development and quality assurance cycles. Data Junction has reduced the time required to implement new partners from over a week to less than one business day. Data Junction also helped us address the challenge of partners whose file format and data attributes change often by allowing us to easily modify scripts to handle variable input. Data integrity was also a significant problem. With Data Junction, our business analysts (not programmers!) have been able to rapidly build partner-specific processes that salvage about 97 percent of data failures. We needed a fast, flexible integration tool for populating our data warehouse. Data Junction enables us to extract information from our Sybase and Microsoft SQL Server production systems, cleanse and validate the data, and then map it to our Microsoft SQL Server 2000 data mart and OLAP system. Finally, the tool we were using to extract data from Web log files for analysis was inefficient. Its filter and extraction process failed to provide useful data for analytics. Data Junction's Content Extractor solved that problem easily and has reduced the time it takes to prepare our log files for analysis from nearly 12 hours (with poor results) to under an hour (with valuable data).

PRODUCT FUNCTIONALITY: Data Junction supports hundreds of data formats which has enabled us to work with any type of data presented. Data Junction's Content Extractor has proven to be extremely efficient and enables us to swiftly pre-parse 10 to 11 million lines of freeform log records per day and strip out the information needed for analysis. We plan to expand our use of Data Junction for our catalog export project, which involves exporting subsets of catalog information in XML to our marketing partners.

STRENGTHS: Data Junction's biggest strength is ease and speed of implementation over other solutions. For data extraction and transformations, it is incredibly fast. With Data Junction, our business analysts have been able to handle tasks that previously required writing code, freeing programmers for other critical work.

WEAKNESSES: Data Junction's documentation could be improved, especially the printed manuals.

SELECTION CRITERIA: Our criteria for choosing a data integration solution included flexibility, cost of the product and availability of training. Data Junction scored high in all of these areas.

DELIVERABLES: The outputs Data Junction produces for us are the populating and updating of our data warehouse, integrating commerce partner transactions with our order management system, feedback data loops to our partners and the ability to parse through various unstructured text files. We have realized a significant savings in the time and cost of implementation and development in all of these areas.

VENDOR SUPPORT: We've presented Data Junction's tech support staff with some pretty difficult SQL challenges as well as a variety of Content Extractor challenges. We have been pleased with the speed and the content of their solutions.

DOCUMENTATION: The online help and tutorials could be improved; but on the whole, they are relatively helpful and easy to understand, even for nontechnical staff.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access