REVIEWER: Hank Weiland, director of systems development for INFONXX.

BACKGROUND: INFONXX provides the people, technology and services behind wireless enhanced information services. INFONXX offers the most advanced information services including national directory assistance with call completion, movie listings and show times, restaurant listings and reviews, real-time directions and additional enhanced services for major wireless carriers. INFONXX is the first in the industry to announce a wireless white page service, called MobileSource. INFONXX manages a database with over 150M listings and handles over 200M listing requests a year. To meet the demand of our growing business, INFONXX employs over 2,500 people in six regional locations.

PLATFORMS: INFONXX runs our data applications on Intel class servers with a Windows NT 4.0 Server operating system. We use Microsoft SQL Server, and the databases are replicated in five U.S. locations.

PROBLEM SOLVED: Our business depends on providing people with complete and accurate information. To obtain the highest level of data quality possible, we use several different sources of data to create a large database of hundreds of millions of records. In addition, we run a daily update process in which we add and delete a large number of records from these various sources, typically involving hundreds of thousands of records. In the process of compiling this data, we need to identify and eliminate duplicate records across these various databases in order to decrease the cost of operation and eliminate possible confusion for our customer service representatives who use the information. We found that developing in-house algorithms was too costly and time-consuming. Therefore, we started searching for a quick, easy-to-use solution that could meet our 45-day implementation deadline.

PRODUCT FUNCTIONALITY: After looking at the various products on the market, we chose dfPower Studio by DataFlux to identify possible duplicate records in our company's SQL Server databases. The dfPower Match PowerPack enabled us to develop a systemic process for eliminating these duplicates. dfPower Match first groups identified records based on our matching criteria into duplicate sets and then outputs them into a text file. Initially, we were concerned about the quality of the matching algorithms and the implementation time. However, the product exceeded our expectations, and we were able to meet our deadline of 45 days.

STRENGTHS: dfPower Match has numerous strengths specific to our project. It runs in batch mode, allowing match definitions to be created and used without end-user interaction. Duplicate data sets can be output into a text file format which enabled us to develop our own utility to process this output file to determine what actions should be taken on identified duplicate listings. dfPower Match also allows users to define the criteria for matching duplicate records. This includes the ability to use exact character matches and fuzzy logic. Also, custom reporting includes outputs of additional fields that were not used in the matching process.

WEAKNESSES: There were only two issues in my experience with the utility. When running in batch mode, the GUI interface is not visible, making it difficult to identify the program's progress in analyzing database records. When setting up matching criteria, the program does not give the ability to directly modify SQL select criteria. However, the software does allow for SQL views, which resolves this issue.

SELECTION CRITERIA: The major reason for selecting this product was its fuzzy logic-driven ability to identify duplicate records that are similar. In addition, we were pleased with how easy it was to learn to use the dfPower Studio and the dfPower Match PowerPack, along with how quickly we were able to implement and use the products, giving us an advantage in completing the project.

DELIVERABLES: The program outputs an ASCII text file where duplicate records are grouped together in sets. The software allows a user to select what data fields should be output to this file.

VENDOR SUPPORT: DataFlux did a good job of providing support for installing and configuring their software during pre-implementation of the program. Because the software has been easy to use and install, there has not been a need for post-implementation support.

DOCUMENTATION: The documentation was fairly easy to understand, although the batch-mode features could have been explained more clearly. The sections regarding setup and configuration of the match-definition files were very straightforward.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access