Editor's note: Theis is the first article in this series which highlights several emerging business uses and implementation approaches for data quality technology. The second article, Discovering Data Quality, provides recommendations for extending a point solution into an enterprise-wide data quality platform. The third part, Winning Hearts and Minds (and Money) with Data Quality, examines the practice of data quality - the tangible activities required to increase data usability, improve effectiveness of business decision-making and build confidence among information consumers about data completeness and accuracy.

The data quality market has seen a dramatic level of consolidation recently - yet another clear indication that the quality of data is finally being recognized as a mainstream component of good IT management. For the most part, companies have learned about the cost of poor data quality the hard way, through failure after failure of major IT projects and through cost overruns and schedules gone completely out of whack. Some forward-looking companies are now using data quality for wider purposes and as a distributed service for many systems, applications and business processes across the enterprise.

Early Adopters Pave the Way

Early adopters in large enterprises have now practiced data quality for several years. As usual with early adopters, these companies are finding competitive advantage by staying ahead of the technological pack. What is most interesting about their implementations, however, is not the higher quality data they are able to give their users access to, but the maturity of data quality practices they have strung together from their early implementations.

One typical example is a company who initially wanted to standardize, deduplicate and consolidate customer data for one of its three divisions. They intended to integrate it into e-commerce processes and a data warehouse. Once this was accomplished, they began to extend the data quality processes outward to:

  • Implement data quality for customer data in the other two divisions,
  • Synchronize customer data between divisions,
  • Build nightly feeds for reporting system,
  • Deliver updated data feeds and duplicate alerts to sales,
  • Provide marketing with nightly feeds for analytics,
  • Enrich data for new records,
  • Provide new leads through data sharing across divisions, and
  • Automate enterprise-wide reporting.

The company was able to build out the original project for customer data in one system to extend data quality practices to multiple systems and functions.
This pattern of best practices is essentially an adaptive one: it is one where business priorities are dictating how and when data quality technology gets used, a process where semantics is just as important - if not more important - than syntax and where the only certainty about data and technology requirements is that they will change. No longer a "fix it and forget it" process applied to mailing lists, data quality has become a practice where the solutions are applied in many different ways in order to build a cohesive and strong foundation. The result is more high-quality data for multiple business purposes.

Where to Start?

Faced with mountains of dirty data, disparate systems, tight budgets and long lists of business requests, companies often do not know where to start improving the quality of their data. This is why in the past many companies sought a data quality solution only when they faced failure in a large IT project due to poor data quality. Today, tools for data profiling and data discovery are available to avert those risks, and companies are using them to more precisely scope out large data projects. For this reason it may not especially matter which data you start with. What does matter is when you start thinking about data quality. It pays to plan ahead - and not just for today's data and technology requirements, but also for what users will need down the road.

Companies are taking advantage of external rules-based data quality processes to reuse work done on an initial project for additional projects. Even though the rules may be modified and refined from one project to another, this kind of reuse still saves time and money in implementation. It also promotes consistency across projects and systems. In effect, a single data quality initiative can grow into the start of a data governance program, where all the rules and metadata associated with various projects can be administered centrally if desired.

Multifaceted Data Another change is in the type of data that companies want to improve and maintain. No longer is the focus simply on "name and address" data. In fact, as companies shift from product-centric data management to a more customer-centric design, even customer data has come to mean much more than names and addresses. Customer data also includes product information, purchasing histories, service reports, demographics and many other types of information that give complete pictures of customers and contain rich data for new analytics and business intelligence (BI) tools.

Global data, driven by increasing global business transactions and offshore outsourcing and operations, also presents large and difficult challenges for IT. It is not uncommon for large organizations to transact business in several countries and require technology that can automatically correct, standardize and consolidate records in multiple languages using various linguistic and cultural conventions - all in real-time or near real-time environments.

Today's data is also less likely to be a single, monolithic record of information delivered to all users and systems. Instead it is becoming more multifaceted and serving various business purposes in a number of operational, analytic and reporting applications, often simultaneously. The "single view" that businesses often want obscures the complexity of business purposes that such data must serve.

What's Behind the Consolidations

The current interest in master data management (MDM) and customer data integration (CDI) reflects the desire to consolidate and centralize data stores. The pressure to be able to build business processes that span applications in a service-oriented environment is driving these large integration initiatives. It is also clear that compliance and governance are important factors to consider in planning such projects.

The Connected Enterprise One of the most dramatic of recent changes is the degree to which companies want real-time connectors between systems to support broader data quality practices. Connectors to multiple customer relationship management (CRM), enterprise resource planning (ERP) and supply chain management (SCM) systems help companies maintain complete customer and supplier lists and track inventory. To help organizations ensure that new data meets the same standards they have established for existing data stores, data quality processes are being built into Web transactions. Data warehouse refreshes, which are currently more likely to be daily than weekly or monthly, regularly include data quality processes, either at the source systems or at the warehouse.

Semantics versus Syntax

Data quality projects also help companies better handle the perennial question of data ownership within organizations by getting business users and IT to see the importance of both syntax and semantics and the basic differences between the two. Data syntax covers the way data is formatted and gets represented. Data semantics addresses the meaning of data.

From a data quality perspective, syntax covers transformations and mappings: what changes must occur to make the data consistent and standardized. Semantics, on the other hand, is about whether the data is accurate, usable and useful - in essence, whether it will deliver good information and function well in the target systems.

Data managers are good at syntax because if the syntax is not good, the data may not even load. And make no mistake, designing and executing transformations is no simple task. Semantics is an entirely different affair, however. Assessing semantic data quality - what is useful and meaningful -requires more direct input from the business users. In the long run, the business users should take responsibility for defining and assessing the content and context of information.

Organizations have found that addressing data quality syntax and semantics helps structure and promote collaboration between business users and IT. If the combined team is able to look at the data together - or at least have the same perspective on the data - the issues become more concrete and communication improves. As one successful data quality practitioner recently put it, "The data is everyone's responsibility."

Data Quality Services

As data quality evolves into a service accessible to all systems, applications and service-oriented architecture (SOA) business processes, companies have multiple options for accessing and using such services. The true promise of data quality may well be in embedding them in the business process itself as opposed to being called. Data quality will then become part of standard and specific business processes - for purposes such as fraud detection, and tracking and managing RFID data.

Large multinational enterprises may choose to have their own enterprise data quality. This type of service will behave like a complex adaptive system. It will be capable of learning, detecting new conditions in the data, responding to new business requirements and adapting far more automatically than our systems do today.

At the other end of the spectrum, small and medium companies especially will access data quality on demand. In this configuration, application service providers offer hosted software services. They will slice and dice data quality functionality and package it to serve specific widespread needs. In either case, it seems clear that data quality is poised to enter the mainstream of IT, where it will play a critical role in ensuring that good, reliable and useful information is available to users throughout the organization.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access