My previous post posited that data quality has a rotating frame of reference that can not be ignored, meaning that data quality standards need to be relative to the frame of reference of each specific business use for data across the enterprise. The post, and the commentary it received, rekindled the data quality debate between real-world alignment and fitness for the purpose of use.
In his post Omni-purpose Data Quality, Henrik Liliendahl Sørensen posited that “if you have several different business objectives using the same data you will usually discover that aligning with the real world fulfills all the needs.”
For example, by aligning with the real world, you can validate a postal address irrespective of the frames of reference provided by the requirements to make it fit for the purpose of different uses. A billing use, which needs to mail an outstanding invoice to a customer for payment, has to have a complete and accurate postal address. Whereas a marketing use that is segmenting customers by geography could accomplish its business objective even with a postal address that is invalid because it is incomplete (e.g., a U.S. address with only state or zip code). As Henrik rightfully pointed out, postal address data quality maintained using real-world alignment would fulfill both of these business objectives.
Out of Frame, Out of Context
My concern is that when we ignore data quality frames of reference and declare that all data has to be aligned with the real world so that it would be fit for the purpose of any possible use, then we are disconnecting the data from any and all business context.
If in this example billing was switched to electronic payment, then the validity of postal address becomes a lower data quality priority, and may be superseded by the business need for a valid email address to fulfill the billing business objective. Of course, as Henrik noted, “data quality needs for a specific business objective also changes over time. As a valid address may be irrelevant for invoicing if either the mail carrier gets it there anyway or we invoice electronically, having a valid address and addressee suddenly becomes fit for the purpose of use if the invoice is not paid and we have to chase the debt.”
But without always verifying data quality frames of reference, a lot of enterprise data is maintained, perhaps even in a pristine state of perfect data quality, whether or not it is being used. I have been on many projects where the business reaction to a data assessment that revealed serious data quality issues (e.g., high rates of invalid postal addresses) was: “Who cares? No one uses that source system when they need those data elements.” This is why a data quality metric independent of any frame of reference, which sadly describes most data quality metrics displayed in dashboards and reports, is often ignored by business users.
Is there One Frame to rule them all?
Via the Data Quality Professionals Group on LinkedIn, Emma Fortnum commented that “there is much data that is common across departments that truly is enterprise data, and therefore it is the enterprise’s frame of reference that needs to be considered.” Emma also wisely noted that a significant challenge is that each department’s “custodians and stewards concentrate on their own requirements, failing to appreciate that others have a stake in what they do.”
The enterprise needs to understand all of the data stakeholders and how they are using data to support specific business activities. An enterprise frame of reference should be a superset of all frames of reference for data. Each frame of reference specifies the data quality requirements for a specific business use. In many cases, there may not be conflicting requirements, meaning that all frames of reference are in agreement regarding data quality — however, this should never simply be assumed.
Get Framed for Data Quality
Frames of reference communicate the requirements of all data users, allowing impact analysis to be performed, and stronger business cases to be built for data quality improvements. Frames of reference also enable recasting metrics. Returning to the example, the billing frame of reference has a high threshold for postal address data quality, making any invalid postal address a data quality issue for them. When marketing views the postal address data quality metric, they need it recast within their frame, instead of the billing frame, or a default enterprise frame, showing they have a data quality issue, if in fact they do not.
The bottom line is even when real-world alignment makes data fit for the purpose of every use, you still need to keep track of, and track changes in, each use. To keep the data supporting all your business objectives in context, get framed for data quality.
This blog was originally posted at OCDQblog.com. Published with permission.