(Editor’s note: Joaquim Neto will speak on “Using cloud APIs and big data to auto-steward your consumer MDM hub” at the upcoming MDM & Data Governance Summit in Chicago, July 11-13.)
The future of consumer master data management will be defined by three powerful technologies.
The first is the cloud, and in particular software-as-a-service offerings. The second is “referential matching,” a new paradigm in identity matching that maintains extremely high levels of accuracy even when consumer demographic data is out-of-date, sparse, or inconsistent across records. And the third is automation, and in particular the ability to automate stewardship or the resolution of “suspect duplicate” consumer records flagged by an MDM technology.
The benefits of cloud-based software are obvious and ubiquitous across industries—cost savings, operational efficiencies, reduced maintenance and improved security. But the true breakthrough in MDM will come with software-as-a-service (SaaS) offerings.
Right now, most of the conversation around cloud-based MDM is about hosted versions of what is otherwise legacy on-premise MDM technology. But SaaS offerings have the potential to be more transformative and to provide significantly differentiated capabilities—features that are only possible because the software is cloud-based and has the elasticity of processing needed to manage large data sets both initially and periodically.
SaaS MDM solutions could be deployed for ad hoc projects, implemented alongside on-premises MDM technologies to provide complementary features or deployed between enterprises—for example, to facilitate the exchange of patient data in the healthcare industry, or to perform an overlap analysis of customers between retail companies considering a merger.
Current MDM technologies typically use “probabilistic” and “deterministic” matching algorithms to match and link consumer records across an enterprise and to ensure there is only one master record for each consumer. These algorithms match records by comparing the demographic data contained in those records—data such as names, addresses, and birthdates.
But demographic data is notoriously error-prone, frequently incomplete and constantly falling out of date. And probabilistic and deterministic matching algorithms are only as accurate as the underlying demographic data they are comparing, meaning they are fundamentally limited in how accurate they can be by the availability and quality of the data.
But there is a new paradigm in identity matching technology called “referential matching” that is not subject to these same fundamental limits. Rather than directly comparing the demographic data of two consumer records to see if they match, referential matching technologies instead compare the demographic data from those records to a comprehensive and continuously-updated reference database of identities.
These reference databases typically contain identities spanning the entire U.S. population, and each identity typically contains a complete profile of demographic data—including nicknames, aliases, maiden names, common typos, past phone numbers and old addresses. By matching consumer records to identities in a reference database, referential matching technologies can make matches that probabilistic and deterministic algorithms could never make.
Future consumer MDM technologies will all utilize this unique identity matching approach to be much more accurate than current MDM tools.
Whenever an MDM technology identifies that two consumer records might match, meaning the records likely belong to the same person but it’s not a definitive match, the MDM technology will flag those records as a “suspect duplicate” and create a work item that needs to be reviewed manually.
These suspect duplicates occur at such a high frequency that their volume typically ranges from 10 percent to 20 percent of the overall number of consumer identities contained in the MDM. Enterprises must then make a costly and often frustrating choice. They can either (A) create a data stewardship program whereby each suspect duplicate is manually reviewed and resolved by a full-time employee or contractor, or (B) leave the suspect duplicates unresolved in the MDM even though most of these suspect duplicates are, in fact, actual duplicate identities.
The first option can cost hundreds of thousands of dollars per year for many years—and that’s after the costs, headaches and organizational challenges of creating a data stewardship program in the first place. And the second option erodes the business cases an enterprise has created around having a consumer MDM hub: if 20 percent of consumer records are duplicates, then 20 percent of interactions with your customers will be sub-optimal; analytics reports will be skewed by 20 percent; and you’ll only be gaining a “single view of the customer” for 80 percent of your customers.
Either way, the expected business value of your MDM technology degrades or disappears due to the costs of having a stewardship program or of only gaining a single view for 80 percent of your customers.
The successes of future MDM technologies will hinge on their abilities to automatically process and resolve these suspect duplicates in a timely fashion without human intervention. Not only will this capability improve the return on investment (ROI) of the MDM by reducing the amount of manual effort required to supplement its matching capabilities, but this capability will also enable enterprises to fully realize the business case of having the consumer MDM in the first place.
A good place to start with this automation is to combine the first two technologies I have already discussed. Enterprises should consider integrating cloud-based referential matching solutions with their consumer MDM technologies to automatically process and resolve any suspect duplicates that arise. This combination of the cloud, of referential matching, and of automation will put enterprises on a good path towards the future of consumer MDM and help them realize the benefits of having such a solution.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access