PLATFORMS: The edit lists, scoring scheme, frequency tables, etc., are being generated on a PC Pentium 133 running OS/2 3.0 with C++ 2.1 for OS/2. The Data Clustering Engine is running on a RS/6000 53H with AIX 4.1. Now we are upgrading to a RS/6000 J50.

BACKGROUND: IBM Brazil is a computer company that builds, markets and supports IBM products.

PROBLEM SOLVED: IBM Brazil used the Data Clustering Engine for the de-duplication of company contacts on our marketing system for matching external lists to the database.

PRODUCT FUNCTIONALITY: We download our database to the RS/6000 in ASCII format. Then we append any external files with the proper key code. The Data Clustering Engine generates a cluster key. We use this key on another system for the merging/purging process. This achieves a good matching rate which exceeded our expectations.

STRENGTHS: As you work with edit lists and score schemes, you can directly customize the product to your needs. A full set of functions is available and can be combined to give the needed functionality. As soon as you tune the product, it is just a matter of downloading data and running the product.

WEAKNESSES: The Data Clustering Engine does not access relational databases such as DB2. The software does not perform scrubbing.

SELECTION CRITERIA: The Data Clustering Engine is the most flexible, multi-language product for matching that we found in the marketplace for our purposes.

DELIVERABLES: The product generates an ASCII file, fully customized to your needs. You can output all the input files plus the clustering key and all kinds of scoring rates from the Data Clustering Engine to the report file. The Data Clustering Engine is intelligent software that matches and groups records using names, address and other identification data. Regardless of the error and variation in the data (without the need to clean or scrub, with no risk of data corruption) the software matches data from any country in any language or character set. Despite data quality, this software allows diverse data records to be grouped into "clusters" of persons, households, organizations or any relationship hidden in the data. The uses for "clustering" range from de-duplication of poor quality files to the complex investigation of multi-level links and relationships between internal databases and external files.

VENDOR SUPPORT: We received good support for the implementation process. SSA provided us with the necessary support in a good time frame. Since we received local training, no more support has been required. The product ran by itself with almost no maintenance.

DOCUMENTATION: The documentation is complete and easy to use. After the implementation, the manuals provided all the help we needed.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access