REVIEWER: Bruce Buchanan, senior vice president of National Data Conversion (NDCI).

BACKGROUND: NDCI is a leading solutions provider and data conversion service bureau. NDCI specializes in converting data between "incompatible" computer systems, media types, file structures and applications. With more than 20,000 successful conversions and thousands of satisfied customers, NDCI is a trendsetter in the conversion industry and at the forefront of technological advances.

PLATFORM: 10-node 1 GHz Linux Beowulf Cluster.

PROBLEM SOLVED: We were approached by a client to undertake some investigation into the accuracy of their current service provision in the area of data quality and cleansing. A reasonable sample of their data was assessed at 1 million records. The client also wanted to establish whether there was any viability in bringing a long-outsourced task in house again. From the start of the project, it was clear that we were looking for a product that could scale significantly better than many products and was relatively cost-effective both to maintain and to operate in house while losing as little of the quality in results as possible.

PRODUCT FUNCTIONALITY: Datactics Data Trawler has many features that made the product easy to handle. The inbuilt parallelization worked as specified. The test that was completed on data that had been pre-cleaned indicated that previous service provision had missed approximately eight percent of the data set that was deemed as duplicates. Standardization by the service provider had also failed to correct significant numbers of other data issues. In the future, we intend to explore many of the non-name and address matching potentials of the product in other areas.

STRENGTHS: Data Trawler's strengths include the potential for unlimited scale that runs on a multiplicity of platforms with the ability to customize complex parsing templates.

WEAKNESSES: Working with fuzzy matching and advanced parsers can lead to some unintended results. Documentation is a little weak on techniques for working with these results, but we understand that problem is being addressed.

SELECTION CRITERIA: We selected Data Trawler for its flexibility and power with a hardware infrastructure that is inexpensive and easy to maintain and expand.

DELIVERABLES: The product generates the cleansed data according to the transformational rules specified and creates files for export, table displays and reports. It has ODBC that can be used to update accepted data changes. A match viewer and report generator are also available for the matching and clustering.

VENDOR SUPPORT: The Datactics team provided support as necessary for the initial installation of their data-matching engine. The potential complexity of the product was eased by simple explanations of installation and operation of the GUI. The flexibility of Data Trawler, which could be daunting, is overcome by the appropriateness of this support.

DOCUMENTATION: At the time of our use of Data Trawler relevant to this review, documentation was limited. The vendor provided support in this area, and we understand that our future installations will have more comprehensive support in terms of documentation.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access