The latest buzzword in information management - and by extension, business intelligence (BI) - is master data management (MDM). You may hear this phrase and stated goal of creating a single version of the truth and wonder, "Haven't we been here before?" or, "We already do that in the data warehouse, right?" You may be correct on both counts. MDM has become hot due to the perfect storm of:
- The continued struggle organizations face in managing their master data,
- The need to master the data in the operational environment, and
- Vendor momentum with service-oriented architecture (SOA)-based software designed to systemize the rough spots.
MDM primarily refers to the 15 percent volume of organizational data that supports the organization's transactional data. In data warehouse vernacular, it is what we've come to call dimensional data. Master data management is the organization, management and distribution of corporately adjudicated information with widespread use in the company. It is part of the IT response to modern business requirements, along with right-time data warehousing, data mining, metadata and a data quality program.
Some modern business requirements increasing the focus on master data are vendor management, supplier self-service, promotion management, customer management, churn management and regulatory compliance. Without master data, these requirements cannot be met. This is understood, and we have attempted to meet our master data needs in many ways over the years. Pre-data warehouse efforts included the distribution of master VSAM and database files to distributed applications, which invariably customized their structure and format. Centralization remains elusive with this approach. The cry for a single version of the truth brought on the advent of data warehousing, creating the hub for master data.
I remember architecting several warehouses that went to production with fanfare, only to be met with comments wishing the master data was managed in (or distributed back to) the enterprise resource planning (ERP) environments. This was a struggle. Data warehouses have typically been a poor medium for "closing the loop" and getting its data back to the source operational environments that could also benefit. For example, studies indicate that for the important subject area of "customer," customer master data is widely distributed across dozens of databases and files with generation from both within and outside the company. There is disagreement on the most important aspects of the subject area, such as:
- Granularity. Who is a customer - the individual, the family, the payment?
- Attribute detail source. External or internal? Are there acceptable times to impute missing data?
- What attributes can be trusted? What should be done with less than fully trusted data? Should confidence levels be associated with data?
- What constitutes a duplicate record? What should be done with duplicates?
- What actions form the basis for objective labels such as top-tier customer?
- What master code sets should each element conform to?
Implementing Master Data Management
Each vendor has different names for the architectures that are possible for MDM, but it comes down to three basic patterns. Master data can be managed:
1. In the data warehouse. This is where most master data is managed today - and the place which, according to data lifecycles, is ultimately least desirable due to the lack of leverage with the operational environment.
2. Within the operational environment. In this approach, master data by subject area will exist in various operational systems and will synchronize with the other systems that need the data, including the data warehouse. This is often called the synchronization option.
3. In a separate MDM engine. MDM can be done in a pre-operational environment and distributed in hub-and-spoke fashion to those systems needing the data, including the data warehouse, through a mapping process. This alleviates the need for any individual system to manage master data. This is the holy grail, according to some vendors. It is not yet achieved, to my knowledge. Some solutions mimic the separate MDM engine approach with a virtual separate MDM engine, when in reality the data still resides in its operational sources.
Incidentally, in addition to synchronizing master data, option two should allow for master data query that is agnostic to the system in which it is implemented. Any master data managed in the MDM engine would also be accessible for direct query. Direct query of master data, something many folks find to be surprisingly beneficial, is another accomplishment of MDM.
Because all organizations implementing MDM must already have numerous systems (and thus a master data problem) and already are attempting to master their master data in some fashion, most will ultimately implement a combination of the above strategies.
An optimal rollout will not repeat the mistakes of early data warehouses (and almost all other projects). It also will be iterative, tackling subject areas on a priority basis and allowing for the possibility of any of the management architectures, depending on the situation. This corporate subject area evaluation, establishment of data stewardship and a data quality program, and a technology evaluation are some of the initial steps to MDM success.
The vendor community has taken notice of MDM and is now stepping up. Actually, the market has been in existence since the late 1990s with Hogan CIF for customer data integration (CDI), an important form of MDM - but it is really burgeoning now due to the aforementioned reasons. Leading industry analyst firms are also taking note. Forrester has a Wave report on CDI and Gartner has a Magic Quadrant for CDI Hubs now, with no less than nine vendors making the cut! Gartner's product information management (PIM) quadrant has nine submissions.
CDI, as a subindustry of MDM, incorporates all of the nuances that the customer dimension requires. In many organizations, customer is the primary focus of MDM, followed by product, but MDM will inevitably expand to other areas. The product subject area has spawned a PIM subindustry. We will undoubtedly see more specialized focuses from the vendor community, but MDM as a concept is unlimited in terms of the corporate subject areas it covers. Typical MDM subject areas include:
- Customers (CDI)
- Products (PIM)
- Organization calendar
- Sales hierarchy
The best place to manage master data is in the operational environment and, as with data warehousing, the bigger the enterprise covered, the better.
Sourced records may need to go through workflow processes in order to be suited for the demands of master data. These processes can include data extension, enrichment, quality management and sign-off processes. Tools provide an automated means of passing tasks to ensure that master data records will make it to completion and will do so in a timely manner.
For modeling MDM data, there are two schools of thought. Many MDM tools, especially those specializing in specific subject areas such as CDI and PIM tools, employ a philosophy of prebuilt master data models. Subject-area agnostic tools offer support for original modeling efforts based on customizing in-house models elsewhere in the environment. The trade-off between subject area-specific and subject area-agnostic tools is development speed versus the ability to customize the tool. The tool choice could hinge on the appropriateness of the model.
Requirements for your MDM tools would include robustness in workflow as well as the ability to support all of the architectures previously mentioned, modeling capabilities, ETL (extract, transform and load) capabilities (inbound and outbound), data quality tool integration capabilities, master data query and fit into a Web services-SOA framework.
Organizing and Planning for MDM Success
Despite the obvious benefits, there are challenges to implementing MDM. As with data warehousing, disparate business units must come together to a degree that is new and uncomfortable for many organizations. Centralization, documentation and experience working cross-departmentally are requirements. Unwieldiness and poor data quality in the current master data environment mean more challenges.
Data quality should actually be considered a subprogram of MDM. MDM is more than source selection and publishing/subscribing that "best source" for each subject area. It must also put a laser focus on the quality within the master data. This means all the usual things about data quality we've come to know: referential integrity, uniqueness, cardinality, subtype/supertype constructs, value domains, formatting errors, contingency conditions, calculations, correctness and conformance to a "clean" set of values.
The desired rules need to be determined, data profiling needs to be done against the desired rules, and ideally, the quality is scored and improved. This quality improvement could be facilitated by one of the data quality tools in the marketplace. Expect a convergence of the MDM and data quality markets over time.
Obviously, numerous decisions need to be made in the process of delivering high-quality data to the areas throughout the organization that need it. Once again, I will turn to the data stewardship program to provide this leadership. Many organizations have established data stewardship for the single version of the truth needs of the data warehouse. Hopefully, these efforts have stayed focused on organizational data quality and are not too oriented to the specifics of the data warehouse.
MDM is a discrete deliverable warranting a dedicated team or, at least, corporate governance oversight in order to efficiently meet the requirements of today's organization. Too many projects - enterprise resource planning, data warehousing, supply chain management, human capital management, customer relationship management, sales force automation and master planning of resources among them - require master data and will undoubtedly develop it to the degree necessary for their delivery needs. As a matter of fact, the bulk of many such projects is mastering master data. Imagine the impact on timelines if this important aspect of these programs can become an organizational service.
Timelines and efficiency are the obvious measurable deliverables of MDM. However, the overall efficacy of these projects is the more interesting component of the returns that MDM provides to organizations.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access