Continue in 2 seconds

SyncSort for Windows Provides Critical Inline Processing Solutions for BIOSIS

  • Robert Puphal, Michael Stein, Thomas Nelson
  • January 01 2001, 1:00am EST

CATEGORY: Data Acquisition, Transformation and Replication

REVIEWER: This product was reviewed by the following individuals in the systems development department of BIOSIS: Robert Puphal, director; Michael Stein, publication consultant; and Thomas Nelson, systems consultant.

BACKGROUND: BIOSIS is the world's largest publisher/indexer of biological information in Web, CD-ROM and print-based media. With the most extensive collection of abstracts and bibliographic references to biological and medical literature, BIOSIS processes approximately 560,000 items a year from primary research and review journals, books, monographs and conference proceedings.

PLATFORMS: SyncSort NT Server version 1.0.9 runs on a Dell Pentium II 400 MHz with three Quantum Ultra Wide SCSI II Disk Drives. SyncSort NT Workstation version 2.0.0 enables development and testing of new applications on similar Dell PC workstations.

PROBLEM SOLVED: Our Biological Abstracts contain more than 13 million citations. Thus, indexes are key to organizing and retrieving this information. Originally, we indexed on key words. In 1998 we moved to relational indexing which follows a logical, consistent set of rules that categorize information using a hierarchical chain, making it easier to find and access relevant information. However, we were facing a number of major hurdles in our transition project. We needed to process large amounts of data in a timely manner, correctly sort unusual characters and deal with files larger than two gigabytes ­ which was at first impossible in the NT environment. SyncSort for Windows provided the solution for all of these challenges. We currently use SyncSort applications in 88 separate product generation procedures.

PRODUCT FUNCTIONALITY: We discovered that SyncSort for Windows could reduce processing time dramatically. We substituted SyncSort for a week-long multistep procedure that processed approximately three gigabytes of raw data. SyncSort completed the same job in a single step in just two hours and 12 minutes. In creating the index, a different challenge was the profusion of chemical names with special characters and inconsistent cases which will not necessarily sort correctly. SyncSort's collating sequence feature allowed us to define our own collating sequence using a four-key sort dealing with fields of up to 250 characters. SyncSort did a beautiful job in organizing the data and handling the exceptions. Using SyncSort saved hours of programming, processing and overtime costs. SyncSort also allowed us to overcome the two-gigabyte processing limit imposed by the NT publishing applications. We input our three gigabytes of raw data, used the product feature that allows one input file to be divided into multiple output files and generated two smaller files that our publishing program could handle. Batch processes also use SyncSort to copy UNIX formatted files and convert them to DOS as a part of the BIOSIS product generation job flow.

STRENGTHS: SyncSort for Windows is a powerful, efficient and detailed multiplatform solution which plays a vital role in the generation of the BIOSIS product line. The GUI editing tool presents the full set of options available to the SyncSort user. Thus, SyncSort is an easy tool to learn and incorporate into a job stream. It can process enormous amounts of data in a very accelerated time frame, saving considerable time and money.

WEAKNESSES: The GUI tool, while extremely useful, would be enhanced by a provision to record user documentation of developer, description, status, project and/or date information within a SyncSort instance.

SELECTION CRITERIA: We had long used SyncSort on the mainframe and were very happy with it. When we migrated to Windows NT from the mainframe for this project, Syncsort had just introduced a Windows version. Although we reviewed other PC- based software, SyncSort for Windows was the only product available that could meet all of our production requirements.

DELIVERABLES: SyncSort for Windows is an integral part of the entire BIOSIS publishing product line. It is also widely employed in the production of two other BIOSIS electronic product lines. SyncSort allows us to create our Biological Abstracts and International Life Sciences Index more quickly, efficiently, accurately, easily ­ and at lower cost.

VENDOR SUPPORT: Technical support, especially at startup, was superior. The Syncsort technical support staff is knowledgeable and accessible. They provided support regarding questions, problems or limitations of the NT operating system which is beyond the scope of what is involved in an actual SyncSort for Windows process.

DOCUMENTATION: Documentation, help files and examples all convey correct information concerning what the SyncSort NT system is capable of providing. The documentation is accurate and comprehensive.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access