An Old Problem Continues to Challenge both Users and Vendors

The role of master data management is back on the front burner again. When you think about it, it never really went away. Wherever you have data, you have master data. The care and management of that data is how the information system comes to represent the business context in which it operates. Master data is one of the ways of setting the standard for defining data and information quality. If the master data is out of line, then so is the quality of the information. The enterprise resource planning (ERP) revolution raised the hope of finally consolidating master data around a single transactional system of record. However, these hopes were dashed as proliferating instances of ERP applications were supplemented with customer relationship management (CRM), supply chain management (SCM), and analytic applications (data marts) corresponding to each. In short, the single version of the truth and its representation of the system of record continues to be a point on the horizon toward which our system development efforts converge, but which we never seem to be able to reach. If it is supposed to be a master file, then why are there so many of them? We are chasing a moving target. Meanwhile, the critical path to enterprise data warehousing continues to lie through the design and implementation of consistent and unified representations of customers, products and whatever other master data entities are needed to run your business.

A classic example of the issues that can arise with data warehousing and changing master data is the so-called rewriting of history needed when key dimensions (master data) change. If my sales hierarchy is joined to product sales to track product trends by sales region periodically, then the sales and product master data are an essential part of each sales data point. One hundred deluxe widget couplers are sold in Atlanta, Georgia, in the southeast sales region in the second quarter, which represents a company leading result. Send those guys to club. However, next quarter, the semiannual sales reorganization occurs. Atlanta, Georgia, is made a part of the south-central region. In the third quarter, only ninety deluxe widgets are sold in Atlanta, Georgia. Now the dilemma - if nothing else is done, then the south central region never had any sales in Atlanta, Georgia, prior to the third quarter, because that data point belongs to another region. Or, alternatively, the sales results must be recalculated - and the past rewritten - to capture the 100 deluxe widgets sold in Atlanta, GA in the second quarter as part of the south central region. This will make it possible directly to compare the results from the second and third quarter as part of the south central region - but at the cost of recalculating (and in effect rewriting) history. A third alternative is to maintain both versions of the sales hierarchy master data and dynamically recalculate on the fly. However, in either case, there is a trade-off. You can rewrite history in an opportunistic way and make possible historical comparison or lose the comparisons but be true to the sales hierarchy at a given point in time.

Because the relational database does not support different versions of the same table with the same identifying name in the relational catalog, some installations have used the object-relational features of user-defined objects to simulate different versions of these dynamic dimensions. Though this is not intended to be a product evaluation (or endorsement), the Kalido data warehouse product targets this issue with an abstraction layer that makes possible the manipulation of master data across different dimensions. In my opinion, Kalido has finally come into its own in describing itself as a master data management tool first and a data warehouse second.

In the final analysis, data integration requires schema integration. Schema integration is the determination, reconciliation and rationalization of the underlying meaning of the data models representing the business entities and related functions. Technologies such as ETL (extract, transform and load) tools, meta data repositories and message brokers can be useful in rationalizing and conforming data to a consistent and unified representation of customer, products and other essential data dimensions; however, they cannot solve the problem of understanding how and why the definition of customer in your ERP system is different than that in the CRM or SCM system. For that, the insight of a data administrator is still needed. That is, human insight is needed, at least until advances in semantic analysis or perhaps even the elusive semantic chip comes forth from the labs.

The trend is to continue to expect a variety of technologies to be repositioned in an opportunistic way into the master data management market. In general, this is a good thing, provided that the tools really do offer automation and support for business methods in addressing master data management. Ultimately, the issue is about data architecture and there the recommendation is still to design and implement a centralized architecture for the economies of scale of centralized processing. If the enterprise is a highly distributed one, then it is still advantageous to design a single consistent representation of the master data even if the implementation requires incremental, stepwise rationalization across multiple instances. The compromises necessitated by the heterogeneous data environment of the modern multidivisional corporation mean that many firms will spend the better part of their lives in federation on the path to unification.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access