The following article is excerpted from the white paper, "The Perfect Match: 7 Steps to a Match" written by Cory Shouse. Visit http://www.csiwhq.com/news/whitepaper_requests.asp for a copy of the full paper.
Master data management (MDM) is a series of processes put in place to ensure that reference data is kept up to date and coordinated across an enterprise. A solid MDM approach can have impacts on the business and quality of your systems that make it well worth the investment; however, it is no trivial undertaking. Perhaps one of the most complex and underestimated tasks associated with master data management is matching. Matching is at the core of MDM. For our purposes, let's define a "match" as finding similar customers, vendors or products in multiple disparate systems. Preparing for the match, performing the match and perfecting the results of the match can significantly increase the quality of the reference data within your organization.
In a simple example, Customer A may be represented in two disparate systems. In system 1, the customer may be known by its trade style (i.e., Paws-N-Claws Pet Sitting); however, in system 2 it may be known by its legal name (i.e., PNC, Inc.). If we want to get a good understanding of Customer A's spend across both systems, then we must have a way to match the trade style with the legal name to identify this customer as the same entity.
The value of understanding this relationship goes beyond business intelligence (BI) solutions. Enterprise resource planning (ERP) systems, customer relationship management (CRM) applications and other IT applications can all benefit from a dependable matching approach.
Preparing for the Match
Preparing for a match involves investigating the conditions of the source data, standardizing common elements and, when available, finding a partner to leverage as an external master reference.
Step 1: Clean Your Data
Clean data increases the probability of a match. Attempting a match with dirty data is similar to searching for a friend on Google by using only his or her first name. You may get results, but culling the numerous pages of results will take time. To ensure the same does not happen with MDM, take the time to investigate and understand your data source(s). Investigation of data can range from a simple count of the nulls by column to a complex statistical analysis finding tendencies in the data. A good picture of the primary keys, relationships, constraints and domain values (distincts/ranges) will later help define the rules for the match.
Step 2: Standardize Common Elements
The next step in preparing for a match is to identify and standardize common elements that will be leveraged for matching. In order for the elements to be standardized, they must exist in all files you are trying to match. These elements will be the core set used in comparing across the files. "Customer" is often the first subject area that companies attack due to the high value associated with acquiring a single view of the customer. In the case of a customer, seven elements are often leveraged as the standardized columns. These elements, known as the "Super 7," include:
- Business Name,
- Industry Classification,
- SIC Code,
- Number of Employees, and
- Phone Number.
In the case of item/product, the standard elements might include SKU (for internal systems) or UPC, ISBN and EAN for external matching. In addition, product description, weight or even generic name can be included as required columns for matching.
Step 3: Identify External Reference Source
Finally, where available, identify an external source that can provide a common industry key. Continuing with the customer example, a Social Security number for consumers or DUNS number for companies is common. In the case of a product, it can be more difficult, as products are often unique; however, many industries have established standards. For example, the FDA created the national drug code (NDC) number to uniquely identify pharmaceutical drugs. Some important questions to consider as you look for the right partner include:
- How often is your database updated?
- What are the sources for the data?
- What is coverage of the file for the specific set of data?
- What formats are available?
- What other value-added elements are available?
To achieve an optimal match, it may be necessary to attempt matching against multiple vendors. Provide the vendors a set of your data and see which one provides the best match. Leverage that vendor as your primary, and use the secondary vendor in cases where no match was found in the first case.
William wishes to thank Cory Shouse for his contribution to this month's column.
Cory Shouse is a senior architect with Conversion Services International. With more than 10 years of experience in business intelligence, Shouse specializes in helping companies establish, organize and deliver value to the business. He has assisted a number of Fortune 500 companies define quality assurance programs, organizational and staffing plans, change control procedures, and appropriate information and technical architectures. He may be reached at firstname.lastname@example.org or (469) 939-5385.
In the next column, we will cover performing the match with steps four through six.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access