If you regularly read this column, I suspect you are someone who is involved in improving data quality within your company. You have probably had business partners or customers tell you that your data is bad. On some occasions, they have almost certainly used words that are considerably worse than bad. These are interesting interactions, and there is absolutely nothing constructive that you can do as a result of these discussions. The description “bad data,” or something more emotional, is simply too general to be valuable. It’s as if you went to your doctor to tell her that you have a terrible headache, but you could not be specific about the nature of the pain, its location, frequency and other symptoms that would be useful for a diagnosis.



Three Steps to Solve Data Problems

Solving the bad data problem requires:

  • Clearly defining the nature of the problems your business partners/customers are experiencing,
  • Establishing priorities to tackle the most strategically important issues first, and
  • Implementing an improvement plan with appropriate metrics and communications to your business partners/customers.

Clearly defining the data problems that your business partners/customers are experiencing, establishing priorities for remediation and communicating your enhancement successes are processes that can only be successfully completed in partnership with your business customers.
Begin with the process of defining the nature of the problems. Your business end users or customers need data to complete tasks. They tend to be focused on their tasks and not on the characteristics of the data they employ. They don’t want to, or need to, understand the particulars of how the data is managed and why it may not be what they need, when they need it. Your users simply want good quality data. To assess the root causes of the bad data problem, you will need to know much more precisely what customers mean by bad data. It helps to introduce a shared vocabulary for talking about data quality issues. Timeliness, accuracy, completeness and consistency are one set of quality dimensions that can be useful.

  • Timeliness: Currency of data elements.
  • Accuracy: Attributes of the entity (object) are correctly represented.
  • Completeness: Breadth (number of entities) and depth (number of fields defined and populated).
  • Consistency: Identity, definitions, hierarchies, standards and metrics are the same within and across databases.

If your business end users understand these four data quality dimensions, you may discover, for example, that the inaccurate entity identity data issues your customers are talking about (“This isn’t the right address. Can’t you guys do anything right?”) is actually a timing problem because your extraction and update cycles are too slow. Or, another common example is a lack of market opportunity data for small businesses (“This is such bad data. Where are the employee numbers for these small businesses?”), which is a completeness problem that can probably never be solved because no one has the data.
Once you and your business customers have a common vocabulary that permits you to disaggregate the problems into quality components, both you and your business end users will have a much clearer picture of what could be fixed. The next step is to determine what will be fixed. No enterprise has the resources to create perfect data for all data types in all databases. You and your customers need to agree on priorities for attacking specific data problems. This is an effort that involves looking at data types (entity identity data, market opportunity data, relationship data, transaction/role data, etc.) and at the quality dimensions (timeliness, accuracy, completeness and consistency). In many cases, your highest-priority initiatives will involve enhancing only one or two quality dimensions for one or two data types. The choices made in this step should be re-evaluated regularly as your customers’ strategic priorities change.

Perhaps the most important component in the process of solving the bad data problem is regular communication between you and your business end users or customers. Once you both speak the same quality language and both agree on which issues should be tackled first, you need to report regularly on progress against goals. When business end users have a clear idea that progress is being made, they are often remarkably patient. There is an old proverb that states: A half a loaf is a feast for a starving man. Once you have explained the dimensions of quality and once your customers have agreed with you on near-term quality initiatives, they will understand that perfect data - the whole loaf - isn’t available. They will also understand that you are working to deliver the data they need for their strategic goals, the tasty and fulfilling half loaf.

Key takeaway: Most of your business customers don’t actually understand data quality, but they do know when they cannot use data to perform necessary tasks. You need to educate them about how to describe their quality symptoms, gain their agreement about quality improvement plans and communicate regularly on the successes of those plans.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access