- The continued struggle organizations face in managing their master data,
- The need to master the data in the operational environment, and
- Vendor momentum with service-oriented architecture (SOA)-based software designed to systemize the rough spots.
MDM primarily refers to the 15 percent volume of organizational data that supports the organization's transactional data. In data warehouse vernacular, it is what we've come to call dimensional data. Master data management is the organization, management and distribution of corporately adjudicated information with widespread use in the company. It is part of the IT response to modern business requirements, along with right-time data warehousing, data mining, metadata and a data quality program.
Some modern business requirements increasing the focus on master data are vendor management, supplier self-service, promotion management, customer management, churn management and regulatory compliance. Without master data, these requirements cannot be met. This is understood, and we have attempted to meet our master data needs in many ways over the years. Pre-data warehouse efforts included the distribution of master VSAM and database files to distributed applications, which invariably customized their structure and format. Centralization remains elusive with this approach. The cry for a single version of the truth brought on the advent of data warehousing, creating the hub for master data.
I remember architecting several warehouses that went to production with fanfare, only to be met with comments wishing the master data was managed in (or distributed back to) the enterprise resource planning (ERP) environments. This was a struggle. Data warehouses have typically been a poor medium for "closing the loop" and getting its data back to the source operational environments that could also benefit. For example, studies indicate that for the important subject area of "customer," customer master data is widely distributed across dozens of databases and files with generation from both within and outside the company. There is disagreement on the most important aspects of the subject area, such as:
- Granularity. Who is a customer - the individual, the family, the payment?
- Attribute detail source. External or internal? Are there acceptable times to impute missing data?
- What attributes can be trusted? What should be done with less than fully trusted data? Should confidence levels be associated with data?
- What constitutes a duplicate record? What should be done with duplicates?
- What actions form the basis for objective labels such as top-tier customer?
- What master code sets should each element conform to?
Implementing Master Data Management
Each vendor has different names for the architectures that are possible for MDM, but it comes down to three basic patterns. Master data can be managed:
1. In the data warehouse. This is where most master data is managed today - and the place which, according to data lifecycles, is ultimately least desirable due to the lack of leverage with the operational environment.
2. Within the operational environment. In this approach, master data by subject area will exist in various operational systems and will synchronize with the other systems that need the data, including the data warehouse. This is often called the synchronization option.
3. In a separate MDM engine. MDM can be done in a pre-operational environment and distributed in hub-and-spoke fashion to those systems needing the data, including the data warehouse, through a mapping process. This alleviates the need for any individual system to manage master data. This is the holy grail, according to some vendors. It is not yet achieved, to my knowledge. Some solutions mimic the separate MDM engine approach with a virtual separate MDM engine, when in reality the data still resides in its operational sources.
Incidentally, in addition to synchronizing master data, option two should allow for master data query that is agnostic to the system in which it is implemented. Any master data managed in the MDM engine would also be accessible for direct query. Direct query of master data, something many folks find to be surprisingly beneficial, is another accomplishment of MDM.
Because all organizations implementing MDM must already have numerous systems (and thus a master data problem) and already are attempting to master their master data in some fashion, most will ultimately implement a combination of the above strategies.
An optimal rollout will not repeat the mistakes of early data warehouses (and almost all other projects). It also will be iterative, tackling subject areas on a priority basis and allowing for the possibility of any of the management architectures, depending on the situation. This corporate subject area evaluation, establishment of data stewardship and a data quality program, and a technology evaluation are some of the initial steps to MDM success.
The vendor community has taken notice of MDM and is now stepping up. Actually, the market has been in existence since the late 1990s with Hogan CIF for customer data integration (CDI), an important form of MDM - but it is really burgeoning now due to the aforementioned reasons. Leading industry analyst firms are also taking note. Forrester has a Wave report on CDI and Gartner has a Magic Quadrant for CDI Hubs now, with no less than nine vendors making the cut! Gartner's product information management (PIM) quadrant has nine submissions.
CDI, as a subindustry of MDM, incorporates all of the nuances that the customer dimension requires. In many organizations, customer is the primary focus of MDM, followed by product, but MDM will inevitably expand to other areas. The product subject area has spawned a PIM subindustry. We will undoubtedly see more specialized focuses from the vendor community, but MDM as a concept is unlimited in terms of the corporate subject areas it covers. Typical MDM subject areas include:
- Customers (CDI)
- Products (PIM)
- Organization calendar
- Sales hierarchy