for Information Management Blogs
AUG 2, 2011 9:18am ET

Blogroll

Are You Turning Ugly Data Into Cute Information?

Print
Reprints
Email

Sometimes the ways of the data force are difficult to understand precisely because they are sometimes difficult to see.

Daragh O Brien and I were discussing this recently on Twitter, where tweets about data quality and information quality form the midi-chlorians of the data force. Share disturbances you’ve felt in the data force using the #UglyData and #CuteInfo hashtags.

Presentation Quality

Perhaps one of the most common examples of the difference between data and information is the presentation layer created for business users. In her fantastic book “Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information,” Danette McGilvray defines Presentation Quality as “a measure of how information is presented to, and collected from, those who utilize it. Format and appearance support appropriate use of the information.”

Tom Redman emphasizes the two most important points in the data lifecycle are when data is created and when data is used.

I describe the connection between those two points as the Data-Information Bridge. By passing over this bridge, data becomes the information used to make the business decisions that drive the tactical and strategic initiatives of the organization. Some of the most important activities of enterprise data management actually occur on the Data-Information Bridge, where preventing critical disconnects between data creation and data usage is essential to the success of the organization’s business activities.

Defect prevention and data cleansing are two of the required disciplines of an enterprise-wide data quality program. Defect prevention is focused on the moment of data creation, attempting to enforce better controls to prevent poor data quality at the source. Data cleansing can either be used to compensate for a lack of defect prevention, or it can be included in the processing that prepares data for a specific use (i.e., transforms data into information fit for the purpose of a specific business use.)

The Dark Side of Data Cleansing

In a previous post, I explained that although most organizations acknowledge the importance of data quality, they don’t believe that data quality issues occur very often because the information made available to end users in dashboards and reports often passes through many processes that cleanse or otherwise sanitize the data before it reaches them.

ETL processes that extract source data for a data warehouse load will often perform basic data quality checks. However, a fairly standard practice for “resolving” a data quality issue is to substitute either a missing or default value (e.g., a date stored in a text field in the source, which can not be converted into a valid date value, is loaded with either a NULL value or the processing date).

When postal address validation software generates a valid mailing address, it often does so by removing what it considers to be “extraneous” information from input address fields, which may include valid data accidentally entered in the wrong field, or that was lacking its own input field (e.g., e-mail address in an input address field deleted from the output valid mailing address).

And some reporting processes intentionally filter out “bad records” or eliminate “outlier values.” This happens most frequently when preparing highly summarized reports, especially those intended for executive management. These are just a few examples of the Dark Side of Data Cleansing, which can turn Ugly Data into Cute Information.

Has Your Data Quality Turned to the Dark Side?

Like truth, beauty and singing ability, data quality is in the eyes of the beholder, or since data quality is most commonly defined as fitness for the purpose of use, we could say that data quality is in the eyes of the user. But how do users know if data is truly fit for their purpose, or if they are simply being presented with information that is aesthetically pleasing for their purpose?

Has your data quality turned to the dark side by turning ugly data into cute information?

This blog originally appeared at OCDQblog.com.

Advertisement

Comments (2)
Beauty is sensed by the eyes, music sensed by the ears and both are appreciated (or not) by the mind.

Truth and data are not sensed but perceived or experienced by the mind. Humans have no sensors for data. Data has no physical, chemical or biological characteristics to be sensed. Data isn't ugly, cute, beautiful or hideous. That is why data quality is such an amorphous concept. Yet we keep trying to sense data quality using tools and techniques from entity domains that have physical proprieties like cars and toasters.

How do we measure truth?

One often suggested quality characteristic for data is its representation of reality. How do you measure this quality characteristic? What is reality? Does a customer name present the reality of a customer?

Converting ugly data to cute data or making data presentable is more of an artistic rather than scientific proces so perhaps we should focus on the color palette we use to represent the data rather than attempting to "clean" the data. Present the data as is, with all its flaws and defects for all to see. Only then can we begin to "sense" how people experience that data.

Posted by Richard O | Tuesday, August 02 2011 at 2:32PM ET
I dont agree with Keep it simple - I agree with you Jim - I have what i call the Good, the bad and the Ugly data and you have marked the ugly side correctly - these groupings help define what the problem is bad data isnt really defective its just been tainted by that ETL process with ugly (defective data) and the really ugly data poisions everything in its path. There is a behaviour and a pattern to it ... lets search deeper
Posted by jennifer o | Friday, October 14 2011 at 5:02PM ET
Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.

Blog Archive for Jim Harris

Pondering a Big Data Philosophy
Galileo, the Hubble and Clear Data Insight
When Poor Data Quality Lands on the Ledger
Poor Data Quality That Kills
Data Quality and the OK Plateau

More from Jim Harris »

Blog Index »

Where do young IT professionals (30 and under) obtain information to aid with daily role responsibilities and career development?

Trade publication websites 14%
Social media 23%
Vendor websites 4%
Vendor/community forums 7%
Newsletters 1%
Trade conferences/meetups 2%
RSS feeds 6%
Web search 44%

 

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.