What exactly is data quality? The most obvious answer is that data quality represents the validity of the information. But that doesn't really tell the whole story of data quality.

Dimensions of Data Quality

Take a look at the following dimensions as described by the U.S. Accountability Office.

  1. Accuracy. The extent to which the data is free from significant error.
  2. Validity. The extent to which the data adequately represents actual performance.
  3. Completeness. The extent to which enough of the required data elements are collected from a sufficient portion of the target population or sample.
  4. Consistency. The extent to which data is collected using the same procedures and definitions across collectors and times.
  5. Timeliness. Whether data about recent performance is available when needed to improve program management and report to the business.
  6. Ease of use. How readily intended users can access data, aided by clear data definitions, user-friendly software and easily used access procedures.

Remember, the metadata business model constitutes two sets of customers: the producers of metadata information and the consumers. Who owns the responsibility of these six components of data quality? I will argue that the metadata services group completely owns numbers four, five and six, with influence ownership of number three. Ultimately, the responsibility for accuracy, validity and completeness belongs to those that produce the metadata information. As the broker of knowledge, we must focus on the other elements in order to deliver business value. If everyone focuses on their roles and responsibilities and ensures high performances, then the entire effort will be successful.
I believe that poor data quality is a symptom of the problem, not the problem itself. If bad data gets into the repository, then our data edits and business processes are broken. It is the metadata service group's responsibility to ensure that bad data never gets into the system to begin with. It's not that organizations are opposed to managing data quality, but that's the price of entry. You can't get information into the repository unless you sign off on the quality. The system shouldn't allow you to bring in logical and physical models, data transformations and database definitions unless they all match. If they don't, you simply return the data to the originator and have him correct it. Don't worry; after a few of these "return to sender" information loads, they will get the idea. Hey, the post office does this, so why can't we?

The key to doing a good job is getting it right the first time and maybe having a plan in place to deal with things when they don't go as planned. I shudder to even say anything about dealing with variances because a variance shouldn't be the norm. Systems that control data quality should be efficient and responsive to ensure the customer experience. The systems must ensure data quality on the front end, where the expense of repair is the lowest. The advantage of having systems in place is the elimination of the variation that inevitably comes with data management. The key is deciding what to automate and whom to empower.

Quality is no longer job one; it is assumed and expected. What happens when data quality is no longer special? What happens when the repository accurately reflects the data environment? Maybe poor data quality was acceptable in 1987, but 20 years later, we expect more out of our systems and technology environment. Should we strive for a Six Sigma accuracy rate of 99.9999 percent? Yes - a resounding yes would be more like it. Unfortunately, our technology, processes and architectures are working against us, but that will change over time.

Another thing about focusing on data quality is that it isn't very motivational. Well, maybe during the early stages, but the reality is that you will never reach perfection in data quality. Data ages, systems age and the objects they represent age, which means that any snapshot you take of metadata is inherently wrong. In a catalog of 100,000 assets with an average of 20 metadata fields, even a 99.99 percent accuracy rating generates 2,000 errors. Not to mention, you don't know which metadata elements are in error at any given time without a statistical audit. How are you going to motivate your team when you reach 99.99 percent? "Come on team, we can do it! Another .001 percent is all we need." Data quality is not your P&L (profit and loss); data quality is the price of entry into the business of data management.  

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access