As customer relationship management (CRM), personalization, data mining, one-to-one relationship marketing/database marketing and customer loyalty programs are becoming de rigueur at many large (and some not so large) organizations, billions of dollars are being invested in sophisticated customer data integration technology as a means to total customer data integration (CDI). The underlying technology for CDI evolved out of the data quality tools space, particularly from the concepts of record linkage and matching.
Record Linkage and Match Accuracy
Record matching is a sophisticated process referred to by a variety of different terms such as merge/purge, de-duping, householding, building a 360-degree single customer view, creating a marketing customer information file (MCIF) and others. Regardless of the term used, all perform a similar process of identifying and linking related records by parsing name, address and other text fields into separate components and then using advanced approximate string matching algorithms and sophisticated similarity scoring to compare sets of these components and identify pairs that are similar enough to isolate as referring to the same entity. There has been great success in deploying record linkage for the purpose of customer data integration. However, a key aspect of this process is often glossed over and ignored - the issue of linkage precision and record match accuracy.
Linkage precision guides how well a set of record linkage applications are tuned. Consider this simple mechanism for tuning: match/not-a-match thresholds. As part of the matching process, two records are compared across multiple fields, and the similarity of the two records is evaluated as a function of the application of a set of business rules and corresponding weights associated with each field, resulting in the assignment of a similarity score. If that score is greater than the match threshold, then the pair is deemed a match. If the score is less than the not-a-match threshold, it is reported that the pair does not match. When the score falls between the two thresholds, the pair is shunted to a separate repository for subsequent manual review. Match accuracy is a measure of how well the assorted thresholds, business rules and weights are set to provide the most accurate match.
When match accuracy is high, the results are excellent - better CDI, more aggressive personalization, reduced costs associated with customer interaction, etc. On the other hand, low match accuracy is likely to provide the impression of much poorer customer relationship management, resulting in duplicate mailings, mixed up credit profiles and repeated attempts at direct marketing, among other less heinous crimes. On the other hand, businesses increasingly face major risks when linking records for applications such as health records and financial management, especially in the context of HIPAA privacy requirements, Sarbanes-Oxley compliance, Anti-Kickback Statute and other regulatory constraints. As more businesses and more applications rely upon a single customer view, it becomes increasingly important to ensure that this single view is accurate.
Today's CDI systems have evolved into highly sophisticated applications incorporating leading-edge research and development advances in fields such as information theory, natural language processing, artificial intelligence and others. One major advancement has been the recognition of users' needs to be able to fine-tune the matching and householding behavior to create a single customer view that more directly fits with the business needs. CDI vendors no longer assume that they can dictate to businesses what the "correct" single customer view is. As businesses have become increasingly sophisticated with business intelligence (BI), CRM and one-to-one systems, they have demanded control of their customer definition.
This is typically effected via business rules that control how the single customer view is resolved by the CDI system. In general, a business rule is anything that controls or changes the CDI application's function, such as:
- Parsing, standardizing and matching program parameters.
- Control/configuration/job file settings.
- Lookup table/dictionary entries.
- Data partitioning logic (such as by geography).
- Custom programming logic (exit functions, retry logic, etc.).
- Individually enabled/disabled parsing and matching rules.
In recent years, CDI vendors have started competing on who has the most business rules, basically arguing that more business rules are better. Many vendors now claim to have more than 100,000 business rules, and one vendor at a major industry conference bragged that a large customer added more than 50,000 custom business rules (thus emphasizing how flexible their system was). However, CDI matching and householding accuracy requires precise refinement of all these business rules; otherwise it generates a less-than-effective data warehouse.
Improving Match Accuracy for Competitive Advantage
It is interesting to note that until recently, the notion of applying data quality technology for the purposes of CDI was considered to be leading-edge application of technology. Today, it would be unusual for an organization to not be doing this. Years ago, businesses could gain a major competitive advantage by implementing basic data quality and BI technology. However, today this technology is no longer an optional luxury, but instead is a fundamental requirement just to be on a level playing field.
For example, a company might cancel a promotional campaign because too many consumers such as "Michael Jablowski" did not respond. However, more accurate record matching might reveal that "Mike Jadlowsky" is in fact the same person, and Mike Jadlowsky did respond (or worse, was already a customer - thus indicating wasted marketing).
It is very likely that your own company has already made major investments in record matching, and it is equally likely that all of your major competitors have also made similar investments. However, if this is true, and everyone is doing the same thing, then an enlightened manager should be looking for second-order opportunities for additional competitive advantages. One idea with potential is the evaluation and improvement of match accuracy, which, in turn, will deliver an ongoing competitive advantage by improving the accuracy and effectiveness of all the BI technologies that rely on the data.










Be the first to comment on this post using the section below.