The purpose of name and address matching software is to identify sets of records that refer to the same person. The simplest matching systems do this by directly comparing the records to each other. Certainly this is the most obvious approach. However, as matching software evolved, developers found that external data can help the process considerably. Even basic merge/purge systems rely on tables of names, business terms, cities and other information for parsing and standardization. Address standardization in particular relies not simply on tables of common terms and spellings, but on files that list all known valid addresses. In the U.S. and many other countries, such files are prepared by the local postal service. Sometimes they must be gathered or updated through other means.

The main advantage of a fixed reference table is accuracy. It provides a way to determine whether two similar records really refer to the same entity: If the closest match for both is the same reference record, they can be assumed to be the same. Of course, there are limits to this approach, as the reference table itself may be missing a valid entry or the input record may be so badly mangled that no reasonably close match is found. Consequently, most systems allow input records without a near match on the reference table to retain a separate identity. Sometimes these records are added to the reference table itself with a special code to indicate their origin. That way, if a similar record appears again, the system will at least recognize it as matching the previous record.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access