An Old Problem Continues to Challenge both Users and Vendors
A classic example of the issues that can arise with data warehousing and changing master data is the so-called rewriting of history needed when key dimensions (master data) change. If my sales hierarchy is joined to product sales to track product trends by sales region periodically, then the sales and product master data are an essential part of each sales data point. One hundred deluxe widget couplers are sold in Atlanta, Georgia, in the southeast sales region in the second quarter, which represents a company leading result. Send those guys to club. However, next quarter, the semiannual sales reorganization occurs. Atlanta, Georgia, is made a part of the south-central region. In the third quarter, only ninety deluxe widgets are sold in Atlanta, Georgia. Now the dilemma - if nothing else is done, then the south central region never had any sales in Atlanta, Georgia, prior to the third quarter, because that data point belongs to another region. Or, alternatively, the sales results must be recalculated - and the past rewritten - to capture the 100 deluxe widgets sold in Atlanta, GA in the second quarter as part of the south central region. This will make it possible directly to compare the results from the second and third quarter as part of the south central region - but at the cost of recalculating (and in effect rewriting) history. A third alternative is to maintain both versions of the sales hierarchy master data and dynamically recalculate on the fly. However, in either case, there is a trade-off. You can rewrite history in an opportunistic way and make possible historical comparison or lose the comparisons but be true to the sales hierarchy at a given point in time.
Because the relational database does not support different versions of the same table with the same identifying name in the relational catalog, some installations have used the object-relational features of user-defined objects to simulate different versions of these dynamic dimensions. Though this is not intended to be a product evaluation (or endorsement), the Kalido data warehouse product targets this issue with an abstraction layer that makes possible the manipulation of master data across different dimensions. In my opinion, Kalido has finally come into its own in describing itself as a master data management tool first and a data warehouse second.
In the final analysis, data integration requires schema integration. Schema integration is the determination, reconciliation and rationalization of the underlying meaning of the data models representing the business entities and related functions. Technologies such as ETL (extract, transform and load) tools, meta data repositories and message brokers can be useful in rationalizing and conforming data to a consistent and unified representation of customer, products and other essential data dimensions; however, they cannot solve the problem of understanding how and why the definition of customer in your ERP system is different than that in the CRM or SCM system. For that, the insight of a data administrator is still needed. That is, human insight is needed, at least until advances in semantic analysis or perhaps even the elusive semantic chip comes forth from the labs.
The trend is to continue to expect a variety of technologies to be repositioned in an opportunistic way into the master data management market. In general, this is a good thing, provided that the tools really do offer automation and support for business methods in addressing master data management. Ultimately, the issue is about data architecture and there the recommendation is still to design and implement a centralized architecture for the economies of scale of centralized processing. If the enterprise is a highly distributed one, then it is still advantageous to design a single consistent representation of the master data even if the implementation requires incremental, stepwise rationalization across multiple instances. The compromises necessitated by the heterogeneous data environment of the modern multidivisional corporation mean that many firms will spend the better part of their lives in federation on the path to unification.
Lou Agosta is an independent industry analyst in data warehousing. A former industry analyst at Giga Information Group, Agosta has published extensively on industry trends in data warehousing, data mining and data quality. He can be reached at LAgosta@acm.org.









Be the first to comment on this post using the section below.