How to determine the quality of data
Organizations are generating a deluge of data every moment. Effective analysis and decision-making depends upon that data, so ensuring the availability of quality data is imperative to achieving desired results.
Quality of data is closely related to the concept of ‘fitness for the purpose.’ Data that may be fit for a certain purpose may not be fit for another. So the same set of data may be considered of adequate quality for one specific purpose but may suffer a quality deficit when used for a different purpose.
As an example, the sales figures quoted in a management report may be considered adequate for that reporting, but they may not be considered accurate enough and therefore suitable for recording in the company’s account books. This makes the task of defining ‘quality of information’ in absolute terms a difficult task. The ‘fitness of purpose’ determines whether the data is of adequate quality and, hence, differs based on the situation.
The good news is that a common set of parameters can be used to define data quality.
COBIT 4.1 introduced the concept of data criteria, and the updated versions of COBIT included this concept as part of the ‘data enabler’ goals. This set of parameters, or criteria, can be used to determine the quality of data.
The parameters that define quality include:
- Confidentiality of information – This includes access to data based on confidentiality or secrecy.
- Integrity of information – This includes accuracy, completeness and authorized changes.
- Availability of information – This includes controlled availability, retrieval and archival abilities.
- Efficiency – This includes process of generation, collection and the effort involved.
- Effectiveness – This includes utility of the data.
- Reliability – This includes believability, reputation and trust associated with the data.
- Compliance – This includes quality, legal and other compliances that are necessary.
These parameters apply quite effectively across the different phases of the data life cycle. Quality can be viewed as a state or attribute of the data. The state is influenced by different factors that play a part across the data life cycle. Assuring integrity requires that this factor be satisfied at the time the data is generated. Building into the data later would prove extremely challenging.
Similarly, if the methods adopted for the collection or generation of data are not reliable or compliant, the data quality would suffer significantly.
Sustaining quality requires that an organization implement the necessary assurance processes. Some of these can be automated and technology-based controls, while others require human oversight.
Aspects related to the confidentiality, integrity and availability of data can largely be programmed, and thus automated. Decisions and aspects related to reliability and compliance of data many require human intelligence and decision-making. Efficiency and effectiveness may require expert advice or audits.
Quality of data is more than data accuracy. Multiple factors determine the quality of data. Quality is achieved not by accident, but by design. End users, domain experts and technology experts all have important roles to play in ensuring data quality.