For the longest time, operational business applications ran with embedded functionality for managing master data. It was the right strategy if deployed under the assumption that companies eventually would consolidate and support all business processes on one platform, a single enterprise resource planning system. Very few organizations ever got close to realizing this vision due to continuous M&A activities and the lack of a compelling case to fully harmonize business processes and decommission all systems but one. Hence the idea of master data management focusing on only one system proved to be a mistaken and inflexible idea. That is why over the past five to eight years we have seen master data repositories emerging as distinct applications and related governance and methods acknowledged as a discipline, MDM.
Because MDM addresses core master data issues at the root, the focus has largely been on the data-producing side of the information supply chain. Meanwhile, to a large degree, the BI world has dealt with MDM-related issues as it always has: Issues were solved downstream as part of the extract, transform and load processing. The predicament is that many of the synergies that could be realized by leveraging the operational MDM discipline are often left untouched when it comes to BI. In other words, it is due time for BI to engage in true master data management.
This article aims to uncover how and where to address different master data challenges and how to let both the BI and online transactional processing worlds embrace and utilize MDM on equal footing. It addresses key areas where the MDM discipline impacts traditional best practices for BI and data warehousing. In particular, MDM changes the type of data transformation and data quality management needed in the ETL layer of the data warehouse architecture.
This article also argues that the creation of a single version of the truth in an enterprise data warehouse mandates MDM. Otherwise companies are just creating a consolidated version of the truth.
Strategies to Reach A Single Version of the Truth
Traditional data warehousing best practices tell us how inconsistent data from heterogeneous operational systems is transformed into a single version of the truth in the enterprise data warehouse. From the EDW, data flows through various data marts to BI solutions. This flow of data, from transaction-oriented systems through the EDW to BI solutions, is commonly known as the information supply chain.
A frequent challenge is the quality of data in closed loop transactions - two examples being a customer credit rating based on predictive models or marketing segmentation of customers. If master data is altered downstream through the information supply chain, the results might be correct in the BI environment, but when feeding the data back into the unchanged master data of the operational systems, discrepancies occur.
Another challenge addressed by data warehouses is the construction of conformed dimensions that allow local versions of the same master data (customer, for instance) to be consolidated into a master version that is then used for harmonized downstream reporting. In many cases, the problem with this approach is that the enterprise data warehouse receives data “too late” in the information supply chain to properly address numerous master data management issues.
An example of this is customer master data recorded with different conflicting types in different OLTP systems. Through the ETL process it is cleverly fixed by letting one customer type “survive” through the transformation. As depicted in the figure above, sales figures reported on customer type “wholesale” will be different in the reports run inside/outside the green box. The BI report will show sales with “wholesale” and “retailer” figures aggregated. The report executed on the operational systems data will show only “wholesale”; hence we have two versions of the truth – simply because master data was harmonized downstream as opposed to at the source.
Issues like this happen all the time in the BI realm. The most common strategies to mitigate data quality issues in the sources are:
All or nothing. In some data warehouses, an “all or nothing” load policy is applied on row or package level from the source systems with the idea that if data is not correct, it is not loaded, and in some cases, a workflow is spawned to have the data owner of the source correct the date. This is common for data warehouses with strict referential integrity. This rather rigid policy leads to data warehouses with a fairly high level of data quality. The downfall is, however, that if some data is missing in the reports because they were kicked back during load, only some of the truth is being reported on.
Solve it in the ETL. The described dilemma inspires the approach to solve master data inconsistencies through complicated transformation logic and business rules in the ETL flow. With this approach, you ensure that data is loaded and that the users are reporting on the whole truth, but the flip side brings unfortunate issues like decreased transparency, impeded flexibility and difficulties in reconciliation between operational systems and BI reports.
Both strategies beg the question: What is the point of having one version of the truth in your enterprise data warehouse if it is different from the truth that you run your operational processes according to? This is particularly relevant where many tactical business decisions are based on data directly in operational systems.
The remedy to the described situation is to start looking at how the focus on MDM can change the systems, processes and accountability in the data-producing part of the business to increase data quality and promote transparency and flexibility. For a BI practitioner, there is gold to be found in the MDM discipline. But beware! Simply replicating the MDM practices applied in the operational world into the BI solution is missing the point. (Unfortunately, this is actually how it is done in many data warehouse implementations.) The case proposed here is to discontinue parallel master data mitigation initiatives (in the BI and the operational world) and instead commence tackling the issues where they are tackled best.
A number of issues are caused by improper master data quality, and each of these should only be solved in one place. The challenge is to find the right place. Examples of these issues are:
- Inconsistent field use (e.g., customer code that sometimes contain a phone number);
- Inconsistent field definitions (e.g., different definitions of a finished product);
- Record deduplication (e.g., mapping/merging customer records from various systems into one surviving customer. Consider here also different codes for the same payment terms, countries, product types, etc.);
- Field integrity (conflicting information such as the same customer – different ordering address);
- Ad hoc hierarchies and their consolidation (e.g., support the creation and maintenance of hierarchies for analysis, simulation or financial consolidation);
- Formatting (e.g., different formatting of customer names, dates or phone numbers);
- Missing values (e.g., an industry code or a product status code);
- Definition of key performance indicators (e.g., group definition of same store sales, inventory turnover ratio);
- Slowly changing dimensions (e.g., aligning current sales figures with historical customer hierarchy).
The figure outlines different typical types of data quality and data transformation challenges that every BI organization faces on a daily basis. While these have all traditionally been addressed as an ETL exercise, it is recommended to focus the effort where it is best tackled. By doing this, you start to leverage both the knowledge and system capabilities in a more optimal way.
Of course, there are no rules without exceptions. The above principles obviously do not consider specific business drivers relative to particular environments. However, they serve to encourage BI organizations to start looking toward the MDM organization for value and to utilize the strengths found there. It might sound like a truism, but this practice has not reached a proper level of adoption in the majority of the organizations we work with worldwide.
There are great benefits to be harvested by the BI departments that embrace the MDM discipline and that revisit the architecture and ETL processes from an MDM perspective. Transferring mitigations of relevant master data-related issues from the ETL jobs to the transactional MDM capabilities can lead to greater flexibility, improved performance, faster implementation cycles increased transparency and maturing the organization in understanding the link between data in the operational environment and the reports they are getting.
Not all data quality-related functionality should be changed/migrated, but there is a lot of ETL code out there that solves a problem which would have been solved better at the root rather than downstream – simply because solving errors at the root provides downstream users and systems with high quality, and the cleaning effort is also performed once as opposed to in each of the receiving systems and processes.
An important recommendation to bring you closer to accruing these benefits is to make sure you and your BI consultant know:
- What a master data repository is and how it can be leveraged for your BI solutions;
- Where to address different types of data quality and data transformation challenges;
- How to manage cross-references in a manner that is aligned with MDM;
- How to reuse the master data definitions from the enterprise information model when defining the EDW data model;
- That MDM and BI programs must be aligned (ideally as part of the same information management program);
- Data transformation and quality challenges may be best solved in the BI environment in the short term, while they should be addressed in the MDM environment in the long term.
If your organization already has an MDM organization that tackles the OLTP-related master data issues (the left circle in the second figure), congratulations, you are well on your way to the next information management maturity level, and well on your way to increasing BI agility, trusting your data and improving the decision-making process. It’s time to bridge gap between BI and MDM organizations.
If your enterprise does not have a separate MDM organization to give the MDM-denoted issues to, do not accept things as they are. Do not continue to mitigate the issues with ingenious ETL code. Stop bailing water out of the boat and start to fix the holes instead. Use the challenges found in the BI realm as drivers for addressing the MDM-related issues where they are addressed best. Establishing an MDM organization to tackle the master data maintenance and improve data quality does not have to be a three to five-year endeavor. It can very well be started with a bottom-up approach and can deliver value within a very short time frame – as long you begin with the end in mind.
Does your organization have a clear, communicated and committed strategy for this? If not, you really need it!