© 2019 SourceMedia. All rights reserved.

How to determine the quality of data

Organizations are generating a deluge of data every moment. Effective analysis and decision-making depends upon that data, so ensuring the availability of quality data is imperative to achieving desired results.

Quality of data is closely related to the concept of ‘fitness for the purpose.’ Data that may be fit for a certain purpose may not be fit for another. So the same set of data may be considered of adequate quality for one specific purpose but may suffer a quality deficit when used for a different purpose.

data quality four.jpg
Green LED lights and rows of fibre optic cables are seen feeding into a computer server inside a comms room at an office in London, U.K., on Tuesday, Dec. 23, 2014. Vodafone Group Plc will ask telecommunications regulator Ofcom to guarantee that U.K. wireless carriers, which rely on BT's fiber network to transmit voice and data traffic across the country, are treated fairly when BT sets prices and connects their broadcasting towers. Photographer: Simon Dawson/Bloomberg

As an example, the sales figures quoted in a management report may be considered adequate for that reporting, but they may not be considered accurate enough and therefore suitable for recording in the company’s account books. This makes the task of defining ‘quality of information’ in absolute terms a difficult task. The ‘fitness of purpose’ determines whether the data is of adequate quality and, hence, differs based on the situation.

The good news is that a common set of parameters can be used to define data quality.

COBIT 4.1 introduced the concept of data criteria, and the updated versions of COBIT included this concept as part of the ‘data enabler’ goals. This set of parameters, or criteria, can be used to determine the quality of data.

The parameters that define quality include:

  1. Confidentiality of information – This includes access to data based on confidentiality or secrecy.
  2. Integrity of information – This includes accuracy, completeness and authorized changes.
  3. Availability of information – This includes controlled availability, retrieval and archival abilities.
  4. Efficiency – This includes process of generation, collection and the effort involved.
  5. Effectiveness – This includes utility of the data.
  6. Reliability – This includes believability, reputation and trust associated with the data.
  7. Compliance – This includes quality, legal and other compliances that are necessary.

These parameters apply quite effectively across the different phases of the data life cycle. Quality can be viewed as a state or attribute of the data. The state is influenced by different factors that play a part across the data life cycle. Assuring integrity requires that this factor be satisfied at the time the data is generated. Building into the data later would prove extremely challenging.
Similarly, if the methods adopted for the collection or generation of data are not reliable or compliant, the data quality would suffer significantly.

Sustaining quality requires that an organization implement the necessary assurance processes. Some of these can be automated and technology-based controls, while others require human oversight.

Aspects related to the confidentiality, integrity and availability of data can largely be programmed, and thus automated. Decisions and aspects related to reliability and compliance of data many require human intelligence and decision-making. Efficiency and effectiveness may require expert advice or audits.

Quality of data is more than data accuracy. Multiple factors determine the quality of data. Quality is achieved not by accident, but by design. End users, domain experts and technology experts all have important roles to play in ensuring data quality.

For reprint and licensing requests for this article, click here.