When is a data quality issue not a data quality issue? Part II
When helping my clients implement a data quality issue management process I always come across resistance to implement the process. Even when it is up and running some stakeholders come up with interesting reasons why they don't need to use it. So I decided to research this in context of others similar experiences and I asked feedback from some data quality related LinkedIn groups on “when a data quality issue is not a data quality issue”.
I asked my assistant Liselle to help collate the responses as I have been very busy lately and her analysis and summary of the feedback was so good that I asked her to write the blog herself. The reason I trusted her to do this, is that she is much more than my assistant, being a very experienced data professional herself (you can find more about Liselle on the Partners and Associates page). So over to Liselle…
Thank you Nicola!
On analysing the responses, some of the key messages were:
- Providing a definition for a data quality issue
- Defining business boundaries
- Recognising the source of the data
- Using data quality tools
Let’s delve deeper into these topics.
A data quality issue can be defined as a matter that causes the high quality of the data to be in dispute.
Data quality is concerned with the accuracy and completeness of the data among other key factors, and it needs to be fit for its intended uses. So a data quality issue would be anything that compromises a business’ ability to effectively operate, plan or make decisions.
In providing this definition, it should start to become clear that the definition of a data quality issue does not vary. It should be shared with all the organization to support identifying issues, but it would not be specific to your organization. You may have to prioritize your issues, but they must all be identified as such.
What do you consider your boundaries to be? Does it matter if you are obtaining data from an external source? If you are using data within your systems, the generation of the data and the potential data quality issues are still data quality issues that need to be recorded and addressed. The data owner will to be notified and the root cause of the issue determined and rectified.
If you want your organization to make good decisions, operate effectively and be able to plan for the future, then the source of the data should not matter. If you are using it and there is an issue, even If the data comes from outside your organization you may not be able to resolve it but at the very least the consumers of that data need to be aware of its shortcomings so that they can allow for it.
In defining the business boundaries, the identification of your data owners and therefore the implementation of data governance, supports in how your data quality issues are processed and solved. Implementing a data governance framework is vital to the long term sustainable improvement of data quality as it provides the mechanism to identify the root cause of issues and changes the culture to a more proactive management of data quality.
In recognizing the source of the data, it may be determined that although you have a data owner who is internally responsible for the data, that accountability for its quality lies outside of the organization. This still needs to be addressed, and having relevant data governance policies would support in determining what you your next steps should be. It should still be recorded and addressed.
Although data quality tools are useful, if the right infrastructure is not in place, then the tools will not be able to effectively identify data quality issues to support finding solutions. If you are not in the position to effectively use data quality tools, having data owners/data governance in place, the right decisions can be made about how best to handle a data quality issues and a more cohesive approach be made to wide spread data cleansing, for example by adding default values used instead.
So there is never a situation where a data quality issue is not a data quality issue. It may not have been identified, but it can still be impacting the quality of your data. Having a data governance framework is one of the first steps to support in the resolution of issues. You should also have a process to log, investigate and action data quality issues. In situations where the data owner lies outside of the organisation, you need to be sure that consumers are aware of the shortcomings so that they can allow for it.
And remember that when you convince your stakeholders to tell you their data quality issues, you will need to log them in a central place. You can click here to download a free data quality issue log template.
(This post originally appeared on Nicola Askham's blog, which can be viewed here).