Data Quality for the Next Decade
Most enterprise data is “adequate” for basic operational needs today. It has been brought to a standard through a long, slow process of selectively upgrading the data intake functions within an organization.
But enterprise data, for the most part, lacks a standard that allows us to project how business will be conducted in the next decade. In the years to come, business will do more than merely ensure that data entry passes basic standards. It will care that the data has true data quality across the enterprise.
This prediction is based in large part on three certainties:
- Information volume is exploding, which is evident not only in the accumulation of data, but also in the information flow into the business through channels and third-party sources;
- It is a real-time business world where opportunities must be realized early, often and accurately; and
- Information is a key business asset no matter what business you’re in; information management is a strong competitive differentiator today.
When organizations develop business models today, timeliness of decisions is almost always a top priority. Latencies that occur when capturing data, analyzing it, making decisions and then taking requisite action must be eliminated. The appropriate action must flow immediately from the presentation of data into the enterprise. This “new information” must then be weighed against the backdrop of information selected from all the enterprise data ever collected and the appropriate predefined action chosen. When that action involves automation, it will be triggered as the data arrives.
Having correct information available at all times is essential to conducting business without latency or extended analysis. To move toward real-time analytics, systems and processes must be engineered with analytics and data quality support built in where the data resides and not engineered as external applications.
Though we are moving to an event-driven, process-oriented world, quality data is fundamental to success. To achieve high data quality, a data model is crucial to understanding the structure and meaning of information.
Data Quality Defined
As I see it, data quality is not the absence of defects; it is the absence of intolerable defects. Every enterprise has defects, and we all know that they lead to real, measurable negative business impact. The side effects of poor data quality could include poor customer service, improper stocking, ineffective campaigns or missed opportunities for expansion. We can safely say that proper data quality management is a value proposition that will ultimately fall short of perfection. But we can also say that managing data quality well will provide more value than it costs.
All data quality defects fall into 11 broad buckets:
1. A lack of integrity of reference between data values across the model
2. Entities without unique identification
3. The quantity of linked relationships (cardinality) in the data does not meet expectations
4. Unmet requirements for field values based on other values (subtype/supertype)
5. Unreasonable values
6. Attributes that are used for multiple meanings
7. Inconsistent formatting
8. Incorrect data
9. Missing data
11. Data that falls outside of its intended codification
By using these categories alongside the articulated business interests, the data quality of a system, or an entire enterprise, can be measured and specific actions can be taken or scheduled for improvement. Avoid the temptation to correct things with a simple data cleanup, because this will only patches the bad data before the next wave of poor data comes through. This is the best time to consider where poor-quality data comes from and employ that information in your improvements.
The Causes of Poor Data Quality
In our haste to build an operational system, we focus on optimizing throughput of transactions, and often leave the quality of the data in use an afterthought. Anything that slows the data entry process is not even a consideration, yet from an enterprise perspective, the entered data is where value arises.
Information is like an army on a battleground that now presents dozens more uses than it did decades ago. A single data entry can be used in dozens of applications, whether it is physically replicated or not. The value of data entry to the initial receiving system is estimated to be only 10 percent of its overall value in downstream systems and the enterprise overall. That makes it puzzling to find that most current data quality solutions still focus on data cleanup rather than quality at the point of entry.
Data quality requires cooperation and incentives that reward good data entry on the basis of quality as well as quantity measures. Often, a bit of oversight and cross-departmental cooperation can greatly improve data quality at the point of entry. While managers can lead these improvements, they need to delegate work built from a data modeling discipline that reaches across many systems that are involved in the data entry process.
Success in the next decade calls for planning (or projecting as best we can) what the organization will look like (markets, size, geography, etc.) and developing commensurate plans for the journey. These plans are lost without a hyper-focus information assets, data modeling, data quality, data architecture and data entry. Whatever anyone in the organization thinks of these disciplines, data modeling and data quality are always carefully attended to by organizations with a vision of success.
This is an edited excerpt from William’s white paper: “Improving Data Quality Through Data Modeling.”