The most commonly used example is what is known as the Stroop Test, which compares the time needed to name colors when they are printed in an ink color that matches their name (e.g., green, yellow, red, blue, brown, purple) with the time needed to name the same colors when they are printed in an ink color that does not match their name (e.g., blue, red, purple, green, brown, yellow). Naming the color of the word takes longer, and is more prone to errors, when the ink color does not match the name of the color.
The Stroop Test, where colors do not match their names, reminds me of the relationship between metadata and data quality if I view the ink color as the metadata and the name of the color as the data, given that understanding data takes longer, and is more prone to errors, when the metadata does not match the data, or when the metadata is ambiguous.
Unlike the Stroop Test, where poor metadata (ink color) obfuscates good data (name of the color), data quality issues can also be caused when good metadata is undermined by poor data (e.g., data entry errors like an email address being entered into a postal address field). And, of course, even when the entered data matches the metadata (or automatic data-to-metadata matching isenabled by drop-down boxes), more insidious data quality issues can be caused by the complex challenge of data accuracy.
Additionally, the point of view paradox can turn data quality debates about fitness for the purpose of use even more colorful than the Stroop Test, such as when data that one user sees as red and green, another user sees as crimson and chartreuse.
But hopefully we can all agree that good data quality begins with good metadata, because better metadata makes data better.
This post originally appeared at OCDQ Blog.