Continue in 2 seconds

The Meaning of Data Quality

  • October 01 2002, 1:00am EDT
More in

Philosophers, marketing executives, linguists and scientists have struggled with the distinctions between data, information and knowledge for decades if not centuries. The suggestion is to take a practical approach to defining these distinctions, but to do so in way that preserves consistency with both logic and experience. We know that lack of data quality costs money – misdirected mail is returned, effort is wasted, rework is incurred, sales and customers are lost and inventory outages occur. Quality implies differences, differences imply distinctions of value and distinctions of value imply market value. Market value implies the dollar value. Like so many things, information quality is a bootstrap operation requiring iteration, a process of learning from one's mistakes and commitment to business results.

When stated out of context, "data quality" is a misnomer. Data in itself is meaningless, data is what is given – it is basic raw material. Whether unstructured or structured content, it is data. Data itself is worthless. It is what you do with the data that has value. Data is the content; and when it is structured in such a way as to reduce uncertainty, then it has value as information. Thus, data plus structure produces information. Information provides differences and distinctions that reduce uncertainty.

A simple example is that the attribute of gender tells us something about a customer. If I am confident that the customer is either male or female but I am not sure which one, then I have not reduced my uncertainty one bit. I do not have any more information than when I started. Whereas if I have the distinction male/female and, literally, the bit of information that the customer is male, then I will plan on selling him a tie rather than a dress. The data without the structure is meaningless; the structure without the data is empty. The structure – the simple male/female distinction – is not information in itself. The application of the structure to the data yields information and provides a reduction in uncertainty.

Figure 1: From Data to Information

A working definition of information and how to transform dumb data into quality information is depicted in Figure 1. As the attributes of the data are structured according to a defined process for transforming the data along the three high-level dimensions of objectivity, usability and trustworthiness, the information quality improves in precisely those dimensions. In particular, information = objective (data) + usable (data) + trustworthy (data). Knowledge is not on the same continuum as data and information. The commitment needed might be represented as a point in one of the quadrants or a circle encompassing the entire diagram. Knowledge = commitment (information).

From a business perspective, knowledge is qualitatively different than information. There is a gap separating information, no matter how high the quality, from knowledge. The "best available information" never results in knowledge without something additional. That something is commitment – commitment to goals relevant to the business enterprise such as customer service, launching a new product or attaining operational excellence. (Knowledge = commitment (information).)

Data, information and knowledge are overlapping categories that describe different aspects of the world of business. They are different ways of describing the same phenomena. One person's data is another's information and vice versa. Yet the distinctions are valid or they would not exist in the first place. Data is what is given – subjective, uncertain and unclear in its use or interpretation. Add structure to data in the interest of reducing uncertainty and the result is information. (Information = structure (data).) Information is built out of data by applying structure, categories and processes – including data models, functional transformations (ETL), queries and representation – in a process that generates increasing objectivity, usability and certainty. Each of these dimensions is further decomposed. Objectivity includes aspects such as accuracy, existence, causality, consistency, timeliness, completeness, unambiguousness and precision. Usability includes ease of interpretation, availability and security. Trustworthiness includes credibility, believability and the accumulated lessons of experience. Start by employing data profiling to build an inventory of data assets and evaluate the state of information quality within the enterprise on a system-by-system basis but from an enterprise perspective. Be prepared for "roll-up-the-sleeves" hard work. This is likely to be both a top-down and bottom-up task because the impact on information quality of relations between systems can only be evaluated by including both sides of the interface. Thus, information quality is improved. However, regardless of how much it is improved and how certain it is, information is still not knowledge. To get knowledge from information, something else – a commitment to a business decision – must be added.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access