The past several columns have focused on name and address matching technologies for consumer data. However, many projects require matching of businesses – customers, suppliers, distributors or other sorts of business partners. Business matching faces all the challenges of consumer matching and some of its own.

The additional challenges fall into two main groups. The first relates to the complexity of business data. While consumer records usually have a relatively simple name and address, business records often also contain contacts, titles, departments, building names and mail stops, etc. This information is in different formats from system to system and, often, from record to record in the same system. The matching software (or, more strictly, the parser) must identify these elements correctly, or at least know which strings can safely be ignored. This requires the recognition of common words, patterns and relationship terms such as "DBA" for "doing business as" and "MS" for "mail stop." The standardization portion of the system should recognize these terms and other common words and abbreviations, such as "co," "corp" and "corporation," and place them in a consistent format. Standardization tables should also recognize and adjust alternate forms of company names such as "IBM" and "Intl Bus Machines" for "International Business Machines." Finally, matching routines should give less weight to words that are common to many businesses. For example, even though two of the three words in the following names are identical – "Jones Marketing Corporation" and "Smith Marketing Corporation" – these are almost certainly two distinct companies.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access