This is an article from the June 2006 issue of DM Review's Extended Edition. Click on this link for more information on DMR Extended Edition or to download this issue in a PDF format.

That giant sucking sound you hear (well, one of them, anyway) is matching software vendors being vacuumed into larger companies. Within the past year, Nokia purchased Intellisync, Business Objects acquired Firstlogic, Informatica bought Similarity Systems, and IBM alone has added Ascential, SRD and Language Analysis Systems. Earlier Pitney Bowes bought Group 1 Software and SAS Institute acquiring DataFlux (2000). There are only a few independent matching software vendors left, and they are not exactly household names: ChoiceMaker Technologies, DataLever, DataMentors, Innovative Systems, Intelligent Search Technology, Netrics. Trillium Software seems a prominent exception, but Trillium is already owned by a large parent, Harte-Hanks.

The interest of big software companies in matching software shows that they fully recognize the importance of sophisticated matching in building enterprise systems. Given that many companies still rely on primitive, homegrown matching techniques, wider deployment of high-quality matching software should increase the overall quality of enterprise customer data integration.

But vendors' decisions to buy and incorporate their own matching systems also reduce the pressure to produce the best matching systems possible. It is another installment in the endless soap opera of suite versus best of breed: most customers will accept whatever matching product their suite vendor provides, whether or not it is the best tool for their particular needs. With a built-in market assured, developers have less incentive to increase quality and more pressure to reduce costs and to focus their remaining resources on integration with their new parents.

Where does this leave potential customers? As with any suite versus best-of-breed choice, they must decide whether the extra value provided by a best-of-breed product is worth the extra cost of integrating it. This is a particularly difficult judgment for matching systems because it is so hard to compare the quality to begin with. Few buyers know how to set up a proper test, and valid comparisons require substantial investment in tuning the different tools for their particular data.

Let's assume you end up with whatever tool your suite vendor has provided. Don't be too concerned: the products the vendors have purchased are all pretty good. This does not mean you should just label the problem as solved through. You need to be sure you get the most out of the tool you are using. Here are a few tips:

Tune, tune, tune. Some systems are more automated than others, but all matching systems must be adapted to the particular data they are working with. Sometimes this is just a matter of running sample files through the software so it can build a statistical profile of likely values. More often, users must tweak specific matching rules and assess the results. In all situations, you need to ensure that your test files contain a good sample of the data your system will actually be matching. If your system doesn't provide much in the way of tuning assistance, take a look at software, which specializes in this sort of analysis.

Take a broad view. Your source files are likely to contain special data that can be useful for matching, such as telephone and account numbers. Default matching logic probably won't take these into consideration, but any decent system can be extended to include them. Such data can be tremendously powerful in helping find otherwise unidentifiable matches.

Allow multiple answers. Most matching deals in degrees of certainty, and different levels of accuracy are appropriate for different purposes. Sending two catalogs to the same person is pretty cheap; failing to link related accounts could cost you a million-dollar customer. Don't be afraid to deploy different matching rules for different applications. Where larger groups such as households are concerned, there is even more reason to employ multiple definitions.

Rely on local knowledge. No statistical algorithm or generic rule set can accurately capture national and cultural idiosyncrasies in treatments of names and addresses. Software designed for international matching will have local editions with relevant rules and reference tables. Be sure to employ such add-ons or create them.

Recognize the system's limits. Software built for name and address matching can sometimes be used to match other types of data, but how well it does this depends on the particular technology. Look carefully at how your matching system works and test it with live data before assuming it can be applied to tasks beyond its primary domain.

Go outside if necessary. You may find a matching problem that your default system just can't handle. If so, be aware that most modern matching software is designed to integrate with other systems via application programming interfaces (APIs) and, increasingly, service calls. This means that integration of a better-suited external system will probably be easier than you think. Given the importance of matching correctly, don't be afraid to look for a solution that truly meets your needs.


Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access