Peter would like to thank Brian Swarbrick of Palladium Group for his contribution to this column.

Throughout this series, I have discussed how organizations can become information empowered, with an inherent assumption that it’s accurate information. When this assumption is incorrect, the results can be devastating: decisions made with poor or incomplete information can often impact a company’s bottom line more negatively than if the information were not presented at all. This month, I’ll discuss how data quality issues occur and how your organization can begin to identify and address them.

Data quality issues arise for a variety of reasons, including acquisitions of new data sources, poorly designed and developed applications, and ironically business intelligence (BI) implementations, where there can be focus on data presentation to the detriment of understanding information accuracy. The popular adage “you just need a dashboard” doesn’t apply anymore. A well-architected BI solution that enforces data integrity and implements robust data quality procedures is a requirement for any enterprise solution but should not be seen as a complete solution for the organization’s data quality issues. Data quality is a business issue that transcends the entire organization, including operational systems.

Gartner predicted that more than 50 percent of BI projects through 2007 would fail or receive lack of acceptance because of data quality issues.1 TDWI estimated that customer data quality issues alone cost U.S. businesses over $600 billion dollars a year.2 With the enactment of the Sarbanes-Oxley (SOX) Act, quality of information is an ever-increasing priority. A clear initiative that focuses on the data quality challenge is required to address this situation.

Data quality is a business issue, and business managers should drive improvement. The first step in identifying data quality issues is to interview business users across the enterprise, assessing their use of data and potential opportunities to improve data quality. This assessment can be done using external resources or leveraging roles from the business intelligence competency center. The key to this assessment is to understand business uses of data, therefore this requires an individual with business expertise. Once this assessment is complete, the data should be traced to its source systems to investigate the cause of potential issues. From a BICC perspective, the data architect is often most qualified to lead this investigation. While this investigation could be done through human evaluation of data and code, there are now many data discovery tools that evaluate data. Once documented, data quality opportunities should be prioritized based on the perceived financial and operational impact, the resources required to address them and any project interdependencies to create a roadmap for enterprise data quality improvement.

Depending on organization size, the data steward(s) should have complete responsibility for data quality governance in his or her area. These gatekeepers must understand their respective systems, the quality of information across those systems and the implications of poor data quality. They must be subject matter experts that provide direction anywhere their information is used and be the sole point of contact for all decisions related to the data they own. This data governance role should fall within the BICC if one exists.

The IT organization should support the data quality improvement initiative, ensuring effective implementation of processes and procedures and making supporting technologies available. To assist IT in this endeavor, many vendors now offer software solutions that support enterprise data quality initiatives. Gartner now categorizes these data quality products as a separate commodity, although vendors offer several of the underlying components as standalone offerings (see Figure 1).

These tools help to identify, discover and improve data quality anomalies. Typically required for implementing quality BI applications, these products should also be used to support data quality initiatives beyond a BI engagement. Typical functionalities available in these products include:

  • Data discovery: Quickly analyzes large volumes of data across different applications and provides statistics regarding the data content (unique values, missing content, data formats, etc.). Analyses can be used to drive change back through the operational systems and BI applications.
  • Data cleansing: Integrates and cleanses customer data (primarily name and address data) from multiple sources and geographic locations for individuals and organizations.
  • Matching or householding: Groups individuals or organizations together for deduplication or to understand relationships, a task impossible to achieve without a robust data cleansing strategy.
  • Ongoing data quality monitoring: Tracks data quality improvements over time.
  • Master data management: Defines and maintains the data critical to an organization’s business processes.
  • Integration with BI and operational systems: Defines rules for improving data quality that existing applications can integrate. Changes to rules are automatically available to subscribers of that information.

To be truly successful, a data quality initiative must be adopted across the organization, and there must be no barriers. With IT and business management working together, improvements in the quality of data can be adopted across the operational systems. BI applications will also benefit from improvements in quality. Business decisions will become reliable, and in turn, impact on the bottom line will be positive.

  1. “Gartner Identifies the ‘Fatal Flaws’ of Business Intelligence and Advises Organisations on How to Avoid Them.” Press Release from Gartner, Inc., February 3, 2005.
  2. “The Data Warehousing Institute’s Recent Study Finds High Quality Data Is Critical To The Success Of Businesses Worldwide.” Press Release from TDWI, February 1, 2002.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access