5 Data Quality Management Steps for MDM

  • February 18 2009, 1:26pm EST
More in

Data quality management and MDM are two key dimensions of enterprise information management. Despite the fact that industry treats them differently in terms of implementation approaches, methodologies and tools, a good amount of correlation and interdependencies exists between these two streams of EIM. Without DQM, MDM is merely a dump of the data repository, and sans MDM, DQM cannot bring ROI to the organization. In a true sense, DQM is the building block of an MDM hub, as quality and accurate data is key to the success of an MDM program. Let us not consign MDM to becoming another silo of data. MDM and DQM together create a strong fusion that supports the long term enterprise information management vision.

Having decided on the subject areas/domains for MDM, it is imperative for an organization to launch a data discovery and analysis program. In-depth analysis of the quality and health of data is a prerequisite of the MDM program. The following data quality management steps are needed to support an agile MDM program:

1. Identify and qualify the master data and its sources. The definition of master data may be different for different business units. The first step involves identifying and qualifying master data for each business unit in the organization. For example, from an accounts receivables perspective, the customer address is the master data, whereas from a distribution perspective, location is the master data. It is then necessary to identify the source systems/applications which store/generate the master information. Perform detailed analysis of the source system’s master data structure, and map between the source systems and MDM hub.

2. Identify and define the global and local data elements. More than one system may store/generate the same master information. Additionally, there could also be a global version as well as local versions of the master data. Perform detailed analysis to understand the commonalities and differences between local, global and global-local attributes of data elements. This analysis is critical for an organization that has distributed global and regional applications and deals with multilingual and multicurrency scenarios. Based on this analysis, a right data sourcing strategy for a MDM hub is defined.

3. Identify the data elements that require data cleansing and correction. At this stage, the data elements supporting the MDM hub that require data cleansing and correction have to be identified. For example, a party record would comprise - name, address, ZIP code, contact phone number, demographic profile, psychographic profile, etc. It is important to scope these data elements for data quality. Communication with the stakeholders is necessary so that as part of the MDM initiative, data quality will be injected into these selected data elements on an organization-wide basis.

4. Perform data discovery and analysis. Data collected from source applications needs to be analyzed to understand the sufficiency, accuracy, consistency and redundancy issues associated with data sets. Analyze source data from both business and technical perspectives. The data profiling and data quality analysis reports reflect the complete diagnosis of the health of data sets like:

  • Basic data statistics and frequency analysis (patterns, unique count, occurrences, etc.).
  • Missing and duplicate attributes of the master data (name and address analysis, etc.).
  • Incorrect and out-of-range value analysis.
  • Data profiling and analysis as per predefined business and technical rules.
  • Cross comparison of data elements between source systems.
  • Data irregularity analysis (heterogeneous spelling, mixed case, etc.).

It is strongly advised to use a data profiling tool to analyze datasets. Data analysis should focus on technical as well as business metadata, and should be quantitative to report data quality.
5. Define the strategy for initial and incremental data quality management. A well-defined strategy should be in place to support initial and incremental data cleansing for the MDM hub.

Asynchronous data cleansing using the batch processes can be adopted for initial data cleansing. Industry-standard ETL and DQM commercial off-the-shelf tools should be used for initial data cleansing.

The incremental data cleansing will be supported using synchronous/real-time data cleansing. The SOA and message queue architecture should be adopted for on-demand real-time data quality management. In-built DQM capabilities within the MDM commercial off-the-shelf tool can be leveraged for incremental data cleansing and consolidation. In some scenarios however, external ETL, EAI and DQM COTS tools may be used for this purpose. In-built DQM capabilities within the MDM commercial off-the-shelf tool can be leveraged for incremental data cleansing and consolidation. In some scenarios, however, external ETL and DQM COTS tools may be used for this purpose.

6. Monitor and manage the data quality of the MDM hub. Continuous data vigilance is required to maintain up-to-date and quality data in an MDM hub. Data quality needs to be analyzed on a periodic basis to identify the trends associated with the data and its impact over the organization MDM program.

Record the variance and statistics of the changes in the data quality with time. Data elements that fluctuate in quality and cross the quality threshold limits need further analysis. A forecast for the probability of changes in the quality of the data helps organization in defining a DQM strategy for the MDM Hub.

In true sense, DQM is the foundation for an effective and successful MDM implementation. A well defined strategy improves the success probability of an MDM program. Organization should embark a data discovery and analysis phase to understand the health, quality and origin of the master data. A detail analysis should be performed to understand the commonalities and differences between local, global and global-local attributes of data elements.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access