The fundamental rule for a data warehouse environment is that it contains the single version of truth for the enterprise but what does this mean? It comes down to data quality. When quality and integrity of information obtained from the data warehouse are questioned or believed to be wrong, the data warehouse is not an effective business tool. There are two aspects to the issue of data quality: fact (the data is right or wrong) and perception (users of the data do or do not find it useful for their needs). I find that issues of fact are understood (even though ensuring data quality is an ongoing challenge), but that issues based on users' perceptions are not.
First let's consider fact. The essential principle for data contained in a data warehouse is that it is the same as the representation of that fact contained in its operational source system of record this is what "truth" means for business intelligence. Every data element contained in the data warehouse is sourced and extracted from an operational system that is the best, most correct version of that data element in the company.
Data quality processes ensure that data elements brought into the data warehouse are "true" and are an essential part of the ETL process. Data elements are scrubbed so that they conform to corporate requirements for data format, usability and correctness. Data quality practices typically include:
- Creating a single definition for each data element critical to decision making and driving sales, service, operational and financial performance.
- Developing logical and physical models showing data relationships and organization.
- Defining key metrics, especially key performance indicators, that are used enterprise-wide.
- Integrating data from multiple sources and resolving differences to match the single accepted definition for data artifacts.
- Validating data extracted from operational source systems to determine the accuracy of information represented.
- Correcting inconsistent data.
Creating corporate standards for data content.
- Treating data from external data providers as if it were your own, scrubbing it the same way, and judiciously matching it with internal data.
These important actions, while improving the quality of source data brought into the data warehouse, can also contribute to business users' perceptions about data quality.
While fact is quantitative (the data element is "true" or false), perceptions are based on requirements and use and are qualitative. This is often overlooked as an aspect to data quality: Does this data serve business users' needs? Is the quality of the users' experience with the data appropriate? It is important to address business users' qualitative expectations of the data warehouse.
These expectations typically include:
Data stability while data in the business environment can be volatile, there are certain activities such as simulation and predictive analysis that require data stability while algorithms and models are being developed. Not having the required data stability will lead to user dissatisfaction.
Data timeliness some users require the most up-to-date information while others are satisfied with data a week or even a month old. Meeting these diverse needs is as important as meeting those for data stability.
Data delivery equally important is how data is delivered. Some users prefer delivery through e-mail, some want PDF documents and others prefer their data delivered using a particular analytic tool.
Data manipulation many business users want the ability to manipulate data to create new metrics, find new data relationships and create their own analyses and reporting.
Data familiarity users expect data to match the data used in their operational source systems. If their data does not match that contained in the source system of record, it will appear unfamiliar and be perceived as wrong. This data inconsistency exists in a surprising number of operational source systems.
Perception is driven by the needs, expectations and uses of data. These are quality factors as important as the factors for data fact.
Who is responsible for data quality? Many organizations assign data ownership. This mechanism works well for operational source systems because the business community that uses one has a vested interest in keeping its data correct. This mechanism does not work for enterprise-wide data warehouses.
Who owns the data in the corporate data warehouse? If a data element is incorrect but it matches the data value in its source system of record, it is the operational source system that needs to be changed according to our definition of "truth." The corrected data element will then flow into the data warehouse. The ownership of the data in the corporate data warehouse, I believe, belongs to the BI competency center.
Data warehouse stewardship requires that each data element be equal to its source system of record and that the qualitative needs of users of the data warehouse are met. The BI competency center uses the data and BI to help the business accomplish its strategic objectives and is staffed with individuals skilled with technologies to do so. This approach makes data quality perception issues as important as fact issues. The data is owned by the organization that uses it and stewards it for the company the one with a vested interest to keep it right.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access