The purpose of name and address matching software is to identify sets of records that refer to the same person. The simplest matching systems do this by directly comparing the records to each other. Certainly this is the most obvious approach. However, as matching software evolved, developers found that external data can help the process considerably. Even basic merge/purge systems rely on tables of names, business terms, cities and other information for parsing and standardization. Address standardization in particular relies not simply on tables of common terms and spellings, but on files that list all known valid addresses. In the U.S. and many other countries, such files are prepared by the local postal service. Sometimes they must be gathered or updated through other means.

The main advantage of a fixed reference table is accuracy. It provides a way to determine whether two similar records really refer to the same entity: If the closest match for both is the same reference record, they can be assumed to be the same. Of course, there are limits to this approach, as the reference table itself may be missing a valid entry or the input record may be so badly mangled that no reasonably close match is found. Consequently, most systems allow input records without a near match on the reference table to retain a separate identity. Sometimes these records are added to the reference table itself with a special code to indicate their origin. That way, if a similar record appears again, the system will at least recognize it as matching the previous record.

Reference tables can also yield significant processing economies, particularly if the same table is shared across multiple installations. It is obviously more efficient to build a comprehensive address table once and share the copies than for each firm to assemble an address table on its own. Similarly, it is more efficient for a service bureau to run the records of many clients against the same reference table than to load a separate reference table for each client. This is true even if the client-specific reference tables, which would presumably be limited to that client's customers and prospects, were each smaller than the single common reference table. Running against a common reference table also lets the service bureau keep that table loaded constantly rather than loading and unloading the individual client tables on a regular basis. This means each client's records can be processed more often – nightly or perhaps even in real time. In addition, the common reference table itself could be updated continuously with new and corrected data so each client would get the benefit of the most current information.

However, there is a fly in the reference table ointment. Processing records against an address reference table alone will not identify duplicates among individuals. This requires comparing names as well as addresses. If name-level matching is needed, then a name-level reference table is needed as well. Even merge/purge and pattern-based matching systems that use address reference tables must still load the client's own customer and prospect tables for name matching. Consequently, the full advantages of reference-based matching are not available to these systems.

Over the past few years, a handful of vendors including Acxiom, Experian and Donnelley Marketing/InfoUSA have introduced name-level reference table matching. The challenge in developing these systems is to build the reference table itself: after all, this involves nothing less than a database with every individual in the country. No government agency provides such a file in the U.S. Thus each vendor needed to assemble its own database from a variety of sources. These include public records such as telephone directories, voter registrations and real estate listings, as well as private sources such as catalog merchants and financial institutions. While this is a costly and complicated process, it is certainly possible with today's technology.

The basic process is that each vendor runs the records from its various sources through a conventional matching process. Records identified as belonging to a unique individual are assigned a fixed ID. The reference table thus consists of all significant variations among input records: where several versions exist for the same individual, there will be several reference table records with the same ID. When clients submit their own files, these are matched against the master table and the system returns the original record plus the matching standard ID. The reference table itself never leaves the custody of the vendor, and clients see only the information they provide plus the ID the vendor has assigned. This contrasts with address reference tables, which are frequently installed on in-house systems.

Because the master table may contain several records describing the same individual in different formats, an input record using any of these formats can be matched directly. This reduces the amount of processing-intensive "near match" logic, providing faster and more efficient performance. Even real-time processing of individual records is possible, although most reference-based matching still runs in batch. Next month's column will provide more information on this topic.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access