Master data management for product data (known as PIM, for product information management) is a different kettle of fish altogether from MDM for customer data (also known as customer data integration, or CDI). It is important to recognize and consider the fundamental differences between the two. One distinction is complexity. Product data typically requires more attributes (or fields) than customer data. A customer might require 10 to 20 attributes for unique identification and to capture the minimum set of data needed to do business with each. But it's not uncommon for product data to have dozens or hundreds of required attributes.
Standardization is an issue, too. In the customer data realm, ZIP codes and other address elements can be verified against postal standards. But many manufacturers are reluctant to release too much detailed information because of concerns of becoming commoditized on the Web. In certain industries, there has been progress recently toward standardizing some elements of product information, with industry associations and government bodies promoting standards like the United Nations Standard Products and Services Code. But there's still a long way to go.
And although customer data is commonly structured into various hierarchies (such as a corporate family tree or a sales geographic rollup), the hierarchy requirements for product data are usually more complex, including bills of material, product/product line/product family rollups and financial reporting breakouts.
A lot of product data is unstructured (such as engineering or marketing documents) or poorly structured (like description fields overloaded with lots of information that should ideally be broken out into separate fields like size, weight, color, packaging, etc.). This variability in structure requires a specialized parsing engine if you want any hope of automating the standardization of your data.
The availability of outside reference databases is much more common for customer data than for product data because a customer is a real entity that exists independently of the enterprise, while a product may exist in the imagination, factories and stores of the company. So, third-party content providers such as D&B and Acxiom, which can be very helpful in cleansing, matching and enriching customer data, may be of limited or no assistance with your product data.
Volumes tend to be higher, too. One of my software industry clients had approximately 25,000 customers but separately managed more than 50,000 individual product records where poor system designs sometimes forced the creation of multiple product records to allow for minor differences or variations of a product.
Other industries like retail or high-tech manufacturing also have very high volumes of product data and can easily have many millions of unique (or supposedly unique) items in their product or materials master databases.
While initial quality levels of customer data are often worse than expected, with product data, quality levels are typically even worse. Using the ACT+C (accuracy, completeness, timeliness and consistency) definition of data quality for assessment, I'm usually shocked by how inaccurate, incomplete, out-of-date and inconsistent product information is.
All of this may sound like a lot of complexity - and it is - but the real kicker seems to be that there are so many categories of product data. What do I mean by categories? Well, what are the rules that tell you a piece of data describing a printed circuit board is valid? Whatever your answer, it's a different answer for sheet metal, which is different from ball bearings, which is different from MP3 players, which is different from digital cameras. You get the point - different rules for each type of product means exploding levels of complexity!
A Different Approach
Do the widespread differences mean that data mastering and data quality approaches that work well for customer MDM won't work for product MDM? Unfortunately, yes.
Andrew White, research VP at Gartner, Inc. said, "Product data is inherently variable, and its lack of structure is generally too much for traditional, pattern-based data quality approaches. Product and item data requires a semantic-based approach that can quickly adapt and 'learn' the nuances of each new product category. With this as a foundation, standardization, validation, matching and repurposing are possible. Without it, the task can be overwhelming and is likely to include lots of manual effort, lots of custom code - and a whole lot of frustration."
I think Andrew's right on the money here. The variability and relative lack of structure, the lack of external standards and third-party referential data sources, the overloading of the description field, the number of requirements for classification and categorization and the differences in hierarchy management all add up to a problem that most data quality tools designed for customer information would have a hard time solving.
Yet all of these same issues make trying to handle product information without a tool-based approach even less appealing. Investigate the semantic-based tools now on the market. They can help you to standardize, enrich, match, repurpose and govern your product information.