Because MDM addresses core master data issues at the root, the focus has largely been on the data-producing side of the information supply chain. Meanwhile, to a large degree, the BI world has dealt with MDM-related issues as it always has: Issues were solved downstream as part of the extract, transform and load processing. The predicament is that many of the synergies that could be realized by leveraging the operational MDM discipline are often left untouched when it comes to BI. In other words, it is due time for BI to engage in true master data management.
This article aims to uncover how and where to address different master data challenges and how to let both the BI and online transactional processing worlds embrace and utilize MDM on equal footing. It addresses key areas where the MDM discipline impacts traditional best practices for BI and data warehousing. In particular, MDM changes the type of data transformation and data quality management needed in the ETL layer of the data warehouse architecture.
This article also argues that the creation of a single version of the truth in an enterprise data warehouse mandates MDM. Otherwise companies are just creating a consolidated version of the truth.
Strategies to Reach A Single Version of the Truth
Traditional data warehousing best practices tell us how inconsistent data from heterogeneous operational systems is transformed into a single version of the truth in the enterprise data warehouse. From the EDW, data flows through various data marts to BI solutions. This flow of data, from transaction-oriented systems through the EDW to BI solutions, is commonly known as the information supply chain.
A frequent challenge is the quality of data in closed loop transactions - two examples being a customer credit rating based on predictive models or marketing segmentation of customers. If master data is altered downstream through the information supply chain, the results might be correct in the BI environment, but when feeding the data back into the unchanged master data of the operational systems, discrepancies occur.
Another challenge addressed by data warehouses is the construction of conformed dimensions that allow local versions of the same master data (customer, for instance) to be consolidated into a master version that is then used for harmonized downstream reporting. In many cases, the problem with this approach is that the enterprise data warehouse receives data “too late” in the information supply chain to properly address numerous master data management issues.
An example of this is customer master data recorded with different conflicting types in different OLTP systems. Through the ETL process it is cleverly fixed by letting one customer type “survive” through the transformation. As depicted in the figure above, sales figures reported on customer type “wholesale” will be different in the reports run inside/outside the green box. The BI report will show sales with “wholesale” and “retailer” figures aggregated. The report executed on the operational systems data will show only “wholesale”; hence we have two versions of the truth – simply because master data was harmonized downstream as opposed to at the source.
Issues like this happen all the time in the BI realm. The most common strategies to mitigate data quality issues in the sources are:
All or nothing. In some data warehouses, an “all or nothing” load policy is applied on row or package level from the source systems with the idea that if data is not correct, it is not loaded, and in some cases, a workflow is spawned to have the data owner of the source correct the date. This is common for data warehouses with strict referential integrity. This rather rigid policy leads to data warehouses with a fairly high level of data quality. The downfall is, however, that if some data is missing in the reports because they were kicked back during load, only some of the truth is being reported on.
Solve it in the ETL. The described dilemma inspires the approach to solve master data inconsistencies through complicated transformation logic and business rules in the ETL flow. With this approach, you ensure that data is loaded and that the users are reporting on the whole truth, but the flip side brings unfortunate issues like decreased transparency, impeded flexibility and difficulties in reconciliation between operational systems and BI reports.
Both strategies beg the question: What is the point of having one version of the truth in your enterprise data warehouse if it is different from the truth that you run your operational processes according to? This is particularly relevant where many tactical business decisions are based on data directly in operational systems.