There is a growing cross-industry recognition of the value of high-quality data. Numerous reports issued over the past two and a half years indicate that information quality is rapidly increasing in visibility and importance among senior managers. According to the 2001 PricewaterhouseCoopers Global Data Management Survey, 75 percent of the senior executives reported significant problems as a result of defective data. In 2002, The Data Warehousing Institute published its Report on Data Quality in which they estimated that the cost of poor data quality to U.S. businesses exceeds $600 billion each year.
Not only that, two critical pieces of legislation passed within the past few years impose strict information quality requirements on both public corporations in the U.S. (the Sarbanes-Oxley Act of 2002) as well as U.S. federal agencies (The Data Quality Act of 2001). Both of these laws require organizations to provide auditable details as to the levels of their information quality.
Some Interesting Issues
The emergence of the value of high-quality information coupled with these new regulations has prompted many organizations to consider instituting a data quality management program, either as a separate function within a logical line of business or at the enterprise level. While this is admirable, there are a number of relevant issues that can impede the integration of information quality concepts. Some these critical issues include:
Questions Regarding Data Ownership: Unless there is a set of clearly defined data ownership and stewardship policies, there are bound to be some questions regarding responsibility, accountability and authority associated with auditing and reviewing the quality of data sets.
Application-Based Data Management: The systems and applications that comprise an enterprise environment may be structured in way that the business manager for each system has authority over the information used within that system. Consequently, each application in isolation has its own requirements for data quality. However, as we move toward a more integrated application environment as well as explore the development of an enterprise architecture, it is possible that application data may be used in ways never intended by the original implementers. This, in turn, may introduce new data quality requirements that may be more stringent than the original, yet there may be hesitation by the application teams to invest resources in addressing issues not relevant within their specific applications.
Administrative Authority: In some instances, the information used in an application originates from a source that is outside of the application manager's administrative authority. For example, in an application that is used to aggregate information from many information partners, many of the data quality issues are associated with problems at the partner level, not within the aggregated system. Because the problems occur outside of the centralized administrative authority, even if the data is modified/corrected at the centralized repository, it does not guarantee that the next submission would not still include instances of the same problems.
Data Quality in an Advisory Role: In application-oriented organizations, another impediment to data quality coordination relates to how one deals with improving the quality of specific data used within an application when that data is sourced from an external data supplier, and is consequently managed outside of the application manager's jurisdiction. Although in some organizations the project structures may already have an associated data qualtiy function, the more important issue is whether in practice all participants will cooperate with the data quality improvement process.
Data Quality as a Business Problem: In many organizations, business clients assume that any noncompliance with expectations results from data quality issues and needs to be addressed by the technical teams. However, in reality, the business rules with which the data appears to be noncompliant are associated with the running of the business. Consequently, those rules should be owned and managed by the business client as opposed to the technical team members, whose subject-matter expertise is less likely to be appropriate to address the problems.
Impact Analysis: Anecdotal evidence may frequently inspire attitudes about requirements for data quality; however, in the absence of a true understanding of the kinds of problems that take place, the scope of the problem and the impacts associated with the problems, it is difficult to determine the proper approach to fixing the problem as well as eventually measuring improvement.
Reactive versus Proactive Data Quality: Most data quality programs are designed to react to data quality events instead of determining how to prevent problems from occurring in the first place. A mature data quality program determines where the risks are, the objective metrics for determining levels and impact of data quality compliance, and approaches to ensure high levels of quality.
Data Quality in an Advisory Role
In essence, many of these issues stem from the simple fact that the persons entrusted with ensuring or managing the quality of data usually do not have authority to take the appropriate steps to directly improve data quality. Instead, the data quality management function may exist to understand and coordinate the data quality activities (reactive or proactive) that currently exist within an environment as well as work toward developing a mature data quality capability within the enterprise. However, the management coordinator role is likely to be an advisor to the application system data "owners." This advisor is tasked with inspiring those owners to take responsibility for ensuring the quality of the data.
Opportunities at the Advisory Level
To summarize, most issues derive from the fact that a large part of data quality management, especially at the enterprise level, is advisory. To add complexity, there is an expectation that as soon as data quality professionals are brought into an organization, there should be some visible improvement to the data. This poses quite a quandary at times because the data quality manager is viewed as having responsibility for some action without necessarily having the authority to make it happen. The key to success, as we have learned with a number of our clients, is to exploit the advisory role and use internal procedures to attach the responsibility to the already existing information management authority. In other words, we guide those information managers in the data ownership positions to accept the responsibility through a combination of the advisory role and the organizational system development life cycle policies, standards and procedures to which those information managers are bound.
If the intention of a data quality management program is to influence system management behavior to integrate ongoing data quality improvement, the ideal approach to integrate the data quality capability is to incrementally introduce components of a data quality management program defined in a manner that establishes the value of the activity and consequently encourages compliance, thereby gaining incremental acceptance and success. For most environments, the goal is to develop an approach to identify and document best practices associated with both application-oriented and enterprise-wide data quality as well as provide guidance for the integration of these best practices into the system development life cycle.
Finessing Personal Objections
While it is unlikely that any individual would specifically disagree with any of the data quality concepts that constitute an effective improvement program, that does not necessarily guarantee any specific individual's participation. Providing a clear business case that demonstrates how specific data quality issues impede the stated business objectives, as well as a discussion of the steps that need to be taken to address the problem, will make the decision to introduce the improvement very clear.
In addition, there must be some guiding principles for development and approval of data quality guidelines, as well as approaches to integration of the best practices into the enterprise. Our approach has been to adapt data quality best practices within a documented guideline structure that conforms to internal policies and standards and can be approved through internal procedures. When a particular activity has been approved through the standard internal channels, it transitions from "guidance" into organizational policy, with all the compliance requirements that implies.
Data Quality Management Coordination via Guidelines
Our strategic goal is to incrementally introduce and gain acceptance of data quality best practices. Our approach is to incrementally introduce these concepts as guidelines while simultaneously applying these concepts in practice. In this manner, we can establish the value of the process as a by-product of value-added activities.
For example, the concept of measuring the business impact of a data quality problem makes sense in principle. This will need to be implemented within the development structure and cycle within the organization. If every measurement or data quality check requires resources and impacts any other systems (even if it doesn't, this is probably still the organizationally correct approach to take), the process involves describing what needs to be done via the standard operational review process, which introduces the task, describes what is to be done, describes why it is being done, what the system impacts are, what the performance impacts are, and what is expected to be reported as a result of the task. However, this is likely to be the final stage of the practical process in obtaining approval for the data quality activity.
In all likelihood, there will be some preparatory legwork to be done even before the review process is initiated, which will include some initial exploratory conversations, development of a clear argument for performing the audit or check, and procurement of the approval of key members of the development team, oversight team and, most importantly, the information managers with the de facto ownership responsibility for the data to be examined. In fact, this process will probably constitute the lion's share of the work to be done, even before the activity is performed.
The result is that as part of the process of introducing any specific data quality activity, we will have accumulated enough organizational knowledge to understand the hoops through which we need to jump before the activity will be blessed by the proper authorities. We, in turn, document the guideline to define the activity, to describe its business value, and to include both informal steps and the formal procedures needed for approval. In the cases that require formal procedures, we will provide templates for any forms, documents, etc.
This leads us to the most relevant part of this approach: exploiting what we call the "meta-approval" process. The guideline document is presented for approval within the standard organizational procedure; however, when the guideline is approved, by inclusion all the processes described within the guideline are also approved, thereby incorporating the formal procedures for performing the data activity as well as the documented accountability and responsibilities associated with specific authoritative managers within the organization. The data quality coordination function then boils down to defining how a data quality activity is integrated into the already existing management structure and ensuring that the activity is properly described and documented. Accountability is assigned to the proper authorities within the organization, along with the requirements for compliance.
Categories of Data Quality Guidelines
What is a data quality guideline? A data quality management guideline is a document that describes a data quality management directive or activity, the concepts that support it, any associated roles and responsibilities, and any technical, operational or administrative implementation details. Data quality management guidelines are intended to either educate about data quality concepts or influence management, processes, activities or system implementations. There are four logical functional areas in which we classify data quality management guidelines: business, technical, operations, and knowledge capture and dissemination.
Business guidelines will contain an evaluation method to analyze the business impact of poor data quality and an approach to prioritizing related data quality activities based on a determination of the activities that will provide the most favorable return.
Technical guidelines contain tools requirements, system development processes or technical methods that are part of the data quality management program. Examples of technical guidelines include: defining data quality business rules and their subsequent implementation, methods for technical solutions to measure conformance with those business rules, mapping information flows throughout the enterprise, data modeling, meta data repositories, data standardization and the requirements analysis, evaluation, selection and deployment of data quality tools.
Operations/Management guidelines describe the operational aspects of data quality management and how management activities are to be implemented within the organization. This includes responsibilities for data stewardship, systems and processes for reporting and tracking data quality issues, processes for resolution of those issues, development of metrics and methods for collecting measurements for ongoing monitoring, management review and approval of data quality rules, periodic qualitative reporting based on data quality criteria and business client expectations, as well as the procedures for the implementation of the products covered within these guidelines.
Knowledge Capture and Dissemination guidelines capture organizational knowledge and separate it from its implementation. The guidelines in this section focus on what meta data aspects of the organization's information assets should be captured and managed, such as the logical and physical models, usage data, business rules and constraints. This will also include guidelines describing the environments in which meta data is to be stored and managed, and methods and guidelines for extracting that meta data and making it available for review by team members. An underlying principle with respect to this area is the ability to document the details of data quality measurement criteria and make those details available for external audit if necessary for compliance reasons.
The issues of coordinating data quality management are tough ones, and a data quality professional must carefully introduce best practices into resistant environments. Working within the policies and procedures already established in an organization helps in bridging the management authority gap that can impede progress toward data quality improvement. I believe that the coordination via a guidelines approach may prove to be a successful one. We have proposed and implemented this strategy with a number of clients, and the customers' reactions have been extremely favorable.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access