JUN 1, 1998 1:00am ET

Related Links

Remsoft Asset Lifecycle Optimization Solution
July 14, 2011
Aberdeen Group: The Cost of Global Sourcing
August 15, 2003

Web Seminars

Data Replication for Real-time (Big) Data Warehousing
Available On Demand
Improving your Overall Analytical Environment by Migrating to a New Data Warehouse Platform
Available On Demand
The Dynamic Duo of Data Warehousing and Real-Time Streams
Available On Demand

Data Clustering Engine Exceeds Expectations for IBM Brazil

Print
Reprints
Email

PLATFORMS: The edit lists, scoring scheme, frequency tables, etc., are being generated on a PC Pentium 133 running OS/2 3.0 with C++ 2.1 for OS/2. The Data Clustering Engine is running on a RS/6000 53H with AIX 4.1. Now we are upgrading to a RS/6000 J50.

BACKGROUND: IBM Brazil is a computer company that builds, markets and supports IBM products.

PROBLEM SOLVED: IBM Brazil used the Data Clustering Engine for the de-duplication of company contacts on our marketing system for matching external lists to the database.

PRODUCT FUNCTIONALITY: We download our database to the RS/6000 in ASCII format. Then we append any external files with the proper key code. The Data Clustering Engine generates a cluster key. We use this key on another system for the merging/purging process. This achieves a good matching rate which exceeded our expectations.

STRENGTHS: As you work with edit lists and score schemes, you can directly customize the product to your needs. A full set of functions is available and can be combined to give the needed functionality. As soon as you tune the product, it is just a matter of downloading data and running the product.

WEAKNESSES: The Data Clustering Engine does not access relational databases such as DB2. The software does not perform scrubbing.

SELECTION CRITERIA: The Data Clustering Engine is the most flexible, multi-language product for matching that we found in the marketplace for our purposes.

DELIVERABLES: The product generates an ASCII file, fully customized to your needs. You can output all the input files plus the clustering key and all kinds of scoring rates from the Data Clustering Engine to the report file. The Data Clustering Engine is intelligent software that matches and groups records using names, address and other identification data. Regardless of the error and variation in the data (without the need to clean or scrub, with no risk of data corruption) the software matches data from any country in any language or character set. Despite data quality, this software allows diverse data records to be grouped into "clusters" of persons, households, organizations or any relationship hidden in the data. The uses for "clustering" range from de-duplication of poor quality files to the complex investigation of multi-level links and relationships between internal databases and external files.

VENDOR SUPPORT: We received good support for the implementation process. SSA provided us with the necessary support in a good time frame. Since we received local training, no more support has been required. The product ran by itself with almost no maintenance.

DOCUMENTATION: The documentation is complete and easy to use. After the implementation, the manuals provided all the help we needed.

Rogerio Luis Loggetto is a senior database strategist for IBM Brazil.

Filed under:

Advertisement

Comments (0)

Be the first to comment on this post using the section below.

Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.
Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.