Last month's column began a discussion of how to select a customer matching system. This month's column continues that discussion by explaining how to make a decision based on test results.

Run the tests. In most cases, the tests will actually be run by the vendor. This is faster and easier than installing the software in house. You will still need to provide instructions regarding matching rules and household definition. You will also want to get some idea of the effort involved in setting up the system. It may not be practical to watch the vendor's staff set up your particular job because the work is performed in small steps by different people over several days or weeks; however, it should be possible to walk through the operation and view each task performed on active data. This will give some idea of the system features and staff skills involved. It should also be possible to get statistics on the computer resources and staff time consumed working on your job.

Compare the results. Each system will have its own standard reports. Data conversion, standardization and parsing will generate statistics on missing data elements, address corrections, postal coding and similar items. Individual records are sometimes coded to show the exact changes that were applied. This makes it easy to find records that had specific types of changes and verify their accuracy. The matching portion of the system will show the number of records input, number of unique individuals identified and (usually) the number of unique households. Most systems also classify the matches, either by certainty level or by the reason they were considered to match. The systems should also provide listings of records that were matched ­ again, typically grouped by category. Visual inspection is very useful for string- and pattern- based matching, but less helpful when reference-based systems bring together records that are superficially unrelated.

While the most obvious statistic to compare across systems is the number of matches found, it is important to realize that matches may be incorrect. Therefore, a higher match rate is not necessarily a better result. In fact, there are three statistics to balance: correct matches, incorrect matches and missed matches. Unfortunately, the "truth" is usually not known for all matches on a file, with the important exception of test cases inserted for this very reason. The primary method of comparing systems is to look for situations where one system has identified a match and another system hasn't, and determine which system is correct. This misses situations where all systems made the same error; however, it does allow a meaningful comparison of the different systems to each other.

Identifying the disagreements between systems requires getting a file from each vendor with the original data plus individual and household IDs assigned to link records that match. Because each record will also contain its original unique ID, the files can be joined to allow comparison. The comparison report takes a bit of work to create, although some matching vendors have written programs to do it automatically.

Except when the correct answer is known because of test cases or pre-researched linkages, judging which system is correct about any given match is a challenge of its own. Users rely mostly on visual comparisons, particularly where string- and pattern-based systems are involved. In some situations, users actively research the questionable matches via telephone calls or other validation methods.

Once the relative accuracy of the different systems has been established, there is still a business analysis to be done. This weighs the costs of the different systems against the values of found, missed and false matches. These values depend on the business situation ­ a false match has little cost when sending a clothing catalog, but could cause a lawsuit when financial accounts are concerned. Such priorities should be discussed with vendors in advance because most systems can be tuned to adjust the balance between false hits and misses.

While accuracy and business value will be the primary factors in selecting a matching system, they are not the only factors. Some buyers reject reference-based systems because they require off-site processing and ongoing service relationships. Some focus on processing speed, computer resource consumption or the staff effort required. Some care deeply about the quality of reports, options to review and the override of questionable matches, or control over matching rules and reference tables. Some need to handle international data or perform complex transformations. Nearly every decision is affected by salesmanship, customer service and vendor background.

Systems differ significantly along all these dimensions. Unfortunately, too many buyers focus on these other issues and neglect to test the performance of the software itself. Given the major differences in accuracy between the different products, this can be a big mistake.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access