The role of machine learning on master data management

There is a lot of hype (as you know) related to Artificial Intelligence (AI), machine learning and specifically deep learning (complex neural networks). You also know (if you have been keeping up with the news) that we are all users of such techniques in many every day tools. But recently the technology has gotten a little too close for comfort.

Some vendors in the data space, specifically focused on data quality, MDM and data management have started talking about how deep learning will change the use of those tools significantly. At this point, I am not so sure. I think there is great promise but, as with many technologies, we need to be clear how we plan to use them,

For example, deep learning might help us discover where our master data is kept. Finding where our master data is, embedded copies all over the place inside and between business systems in a complex landscape of on-premises and cloud apps is a hard task. Deep learning might be able to “spot” where the most frequently referenced data reside (much as the famous cats were “discovered” in the YouTube experiments).

This same concept is what sits at the heart of tools (think of IBM’s Watson) that sifts through diagnosis or recipes and concertos as they break constituent elements down and “discover” (really, it’s a form of classification) each one. But does this change MDM?

We have had access to semantic discovery tools for years. But finding where our master data exists is not equal to MDM – it is just part of the overall set of tasks needed to sustain MDM. In fact, there are two other tasks (among many others) that are much different and we don’t need, and cannot use, deep learning. The first task is “what is your master data” and the second concerns the enforcement of the policies that sustain it.

The former steps should take no more than an hour with the right business people in the room; you simply ask the business users such things as:

  • What is the most important data (at a conceptual level) that is needed to make business process A work as planned?
  • How much of this data is needed also to make business process B work as planned?
  • How much less data can you use to make business process C work as planned?

Once you get to the point where the business users are arguing over the 9th or 10th attribute, you are done. Close the meeting with the miraculous conclusion that the 10th attributes are “it”. Get on with MDM. You don’t need to be perfect and you don’t need a consultant and you don’t need a long list of 20 different master data objects. When you get into those heady situations you are not looking at master data at all – you are probably looking at shared data or application data.

The second task is at the other extreme; the enforcement of policy. It is the work of policy enforcement that sustains the level of data quality and the effectiveness of the workflows executed to meet that data quality and business process KPI’s, that actually brings home the bacon with your MDM program.

The rubber meets the road in MDM not with the discovery of where master data resides (though that is a key step). The rubber meets the road when you can manage the exceptions that would otherwise hold business processes hostage to data; and when you can assure the process integrity and drive outcomes improvements.

Deep learning can help an MDM program with the middle step – of finding out where the data might reside and that will help save a lot of time and money in the overall MDM implementation. But let’s not get carried away with ourselves. Deep learning will not make MDM go away. We just need to keep our feet on the ground and understand the kinds of problems that deep learning can help with.

That’s my story and I am sticking with it (until you tell me otherwise).

(This blog originally appeared on Andrew White's Gartner blog, which can be viewed here)

For reprint and licensing requests for this article, click here.