To stay competitive and grow in today’s market, it becomes necessary for organizations to closely correlate both internal and external data, and draw meaningful insights out of it.
During the last decade a tremendous amount of data has been produced by internal and external sources in the form of structured, semi-structured and unstructured data. These are large quantities of human or machine generated data produced by heterogeneous sources like social media, field devices, call centers, enterprise applications, point of sale etc., in the form of text, image, video, PDF and more.
The “Volume”, “Varity” and “Velocity” of data have posed a big challenge to the enterprise. The evolution of “Big Data” technology has been a boon to the enterprise towards effective management of large volumes of structured and unstructured data. Big data analytics is expected to correlate this data and draw meaningful insights out of it.
However, it has been seen that, a siloed big data initiative has failed to provide ROI to the enterprise. A large volume of unstructured data can be more a burden than a benefit. That is the reason that several organizations struggle to turn data into dollars.
On the other hand, an immature MDM program limits an organization’s ability to extract meaningful insights from big data. It is therefore of utmost importance for the organization to improve the maturity of the MDM program to harness the value of big data.
MDM helps towards the effective management of master information coming from big data sources, by standardizing and storing in a central repository that is accessible to business units.
MDM and Big Data are closely coupled applications complementing each other. There are many ways in which MDM can enhance big data applications, and vice versa. These two types of data pertain to the context offered by big data and the trust provided by master data.
MDM and big data – A matched pair
At first hand, it appears that MDM and big data are two mutually exclusive systems with a degree of mismatch. Enterprise MDM initiative is all about solving business issues and improving data trustworthiness through the effective and seamless integration of master information with business processes. Its intent is to create a central trusted repository of structured master information accessible by enterprise applications.
The big data system deals with large volumes of data coming in unstructured or semi-structured format from heterogeneous sources like social media, field devises, log files and machine generated data. The big data initiative is intended to support specific analytics tasks within a given span of time after that it is taken down. In Figure 1 we see the characteristics of MDM and big data.
Provides a single version of trust of Master and Reference information.
Acts as a system of record / system of reference for enterprise.
Provides cutting edge analytics and offer a competitive advantage
Volume of Data and Growth
Deals with Master Data sets which are smaller in volume
Grow with relatively slower rate.
Deal with enormous large volumes of data, so large that current databases struggle to handle it.
The growth of Big Data is very fast.
Nature of Data
Permanent and long lasting
Ephemeral in nature; disposable if not useful.
Types of Data (Structure and Data Model)
It is more towards containing structured data in a definite format with a pre-defined data model.
Majority of Big Data is either semi-structured or unstructured, lacking in a fixed data model.
Source of Data
Oriented around internal enterprise centric data.
Platform to integrate the data coming from multiple internal and external sources including social media, cloud, mobile, machine generated data etc.
Supports both analytical and operational environment.
Fully analytical oriented
Despite apparent differences there are many ways in which MDM and big data complement each other.
Big data offers context to MDM
Big data can act as an external source of master information for the MDM hub and can help enrich internal Master Data in the context of the external world. MDM can help aggregate the required and useful information coming from big data sources with internal master records.
An aggregated view and profile of master information can help link the customer correctly and in turn help perform effective analytics and campaign. MDM can act as a hub between the system of records and system of engagement.
However, not all data coming from big data sources will be relevant for MDM. There should be a mechanism to process the unstructured data and distinguish the relevant master information and the associated context. NoSQL offering, Natural Language Processing, and other semantic technologies can be leveraged towards distilling the relevant master information from a pool of unstructured/semi-structured data.
MDM offers trust to big data
MDM brings a single integrated view of master and reference information with unique representations for an enterprise. An organization can leverage MDM system to gauge the trustworthiness of data coming from big data sources.
Dimensional data residing in the MDM system can be leveraged towards linking the facts of big data. Another way is to leverage the MDM data model backbone (optimized for entity resolution) and governance processes to bind big data facts.
The other MDM processes like data cleansing, standardization, matching and duplicate suspect processing can be additionally leveraged towards increasing the uniqueness and trustworthiness of big data.
MDM system can support big data by:
- Holding the “attribute level” data coming from big data sources e.g. social media Ids, alias, device Id, IP address etc.
- Maintaining the code and mapping of reference information.
- Extracting and maintaining the context of transactional data like comments, remarks, conversations, social profile and status etc.
- Facilitating entity resolution.
- Maintaining unique, cleansed golden master records
- Managing the hierarchies and structure of the information along with linkages and traceability. E.g. linkages of existing customer with his/her Facebook id linked-in Id, blog alias etc.
- MDM for big data analytics – Key considerations
Traditional MDM implementation, in many cases, is not sufficient to accommodate big data sources. There is a need for the next generation MDM system to incorporate master information coming from big data systems. An organization needs to take the following points into consideration while defining Next Gen MDM for big data:
Redefine information strategy and topology
The overall information strategy needs to get reviewed and redefined in the context of big data and MDM. The impact of changes in topology needs to get accessed thoroughly. It is necessary to define the linkages between these two systems (MDM and big data), and how they operate with internal and external data. For example, the data coming from social media needs to get linked with internal customer and prospect data to provide an integrated view at the enterprise level.
Information strategy should address following:
Integration point between MDM and big data - How big data and MDM systems are going to interact with each other.
Management of master data from different sources - How the master data from internal and external sources is going to be managed.
Definition and classification of master data - How the master data coming from big data sources gets defined and classified.
Process of unstructured and semi-structured master data - How master data from big data sources in the form of unstructured and semi-structured data is going to be processed.
Usage of master data - How the MDM environment are going to support big data analytics and other enterprise applications.
Revise data architecture and strategy
The overall data architecture and strategy needs to be revised to accommodate changes with respect to the big data. The MDM data model needs to get enhanced to accommodate big data specific master attributes. For example the data model should accommodate social media and / or IoT specific attributes such as social media Ids, aliases, contacts, preferences, hierarchies, device Ids, device locations, on-off period etc. Data strategy should get defined towards effective storage and management of internal and external master data.
The revised data architecture strategy should ensure that:
- The MDM data model accommodates all big data specific master attributes
- The local and global master data attributes should get classified and managed as per the business needs
- The data model should have necessary provision to interlink the external (big data specifics) and internal master data elements. The necessary provisions should be made to accommodate code tables and reference data.
Define advanced data governance and stewardship
A significant amount of challenges are associated towards governing Master Data coming from big data sources because of the unstructured nature and data flowing from various external sources. The organization needs to define advance policy, processes and stewardship structure that enable big data specifics governance.
Data governance process for MDM should ensure that:
Right level of data security, privacy and confidentiality to be maintained for customer and other confidential master data.
Right level of data integrity to be maintained between internal master data and master data from big data sources.
Right level of linkages between reference data and master data to exist.
Policies and processes need to be redefined/enhanced to support big data and related business transformation rules and control access for data sharing and distribution, establishing the ongoing monitoring and measurement mechanisms and change.
A dedicated group of big data stewards available for master data review, monitoring and conflict management.
Enhance integration architecture
The data integration architecture needs to be enhanced to accommodate the master data coming from big data sources. The MDM hub should have the right level of integration capabilities to integrate with big data using Ids, reference keys and other unique identifiers.
The unstructured, semi-structured and multi-structured data will get parsed using big data parser in the form of logical data objects. This data will get processed further, matched, merged and get loaded with the appropriate master information to the MDM hub.
The enhanced integration architecture should ensure that:
The MDM environment has the ability to parse, transform and integrate the data coming from the big data platform.
The MDM environment has the intelligence built to analyze the relevance of master data coming from big data environment, and accept or reject accordingly.
Enhance match and merge engine
MDM system should enhance the “Match & Merge” engine so that master information coming from big data sources can correctly be identified and integrated into the MDM hub. A blend of probabilistic and deterministic matching algorithm can be adopted.
For example, the successful identification of the social profile of existing customers and making it interlinked with existing data in the MDM hub. The context of data quality will be more around the information utility for the consumer of the data than objective “quality”.
The enhanced match and merge engine should ensure that:
- The master data coming from big data sources get effectively matched with internal data residing in the MDM Hub.
- The “Duplicate Suspect” master records get identified and processed effectively.
- The engine should recommend the “Accept”, “Reject”, “Merge” or “Split” of the master records coming from big data sources.
In this competitive era, organizations are striving hard to retain their customers. It is of utmost importance for an enterprise to keep a global view of customers and understand their needs, preferences and expectations.
Big data analytics coupled with MDM backbone is going to offer the cutting edge advantage to enterprise towards managing the customer-centric functions and increasing profitability. However, the pairing of MDM and big data is not free of complications. The enterprise needs to work diligently on the interface points so to best harness these two technologies.
Traditional MDM systems needs to get enhanced to accommodate the information coming from big data sources, and draw a meaningful context. The big data system should leverage MDM backbone to interlink data and draw meaningful insights.
(About the author: Sanjay Kumar is a data scientist and freelancer consultant. He has over 20 years of experience in data and analytics with specialized skills in information governance, master data management (MDM), reference data management (RDM), data quality, metadata management and business intelligence and data warehousing. Prior to this role, Kumar was senior managing consultant and service area leader for business analytics and optimization (BAO) at IBM. He may be contacted at firstname.lastname@example.org)