JUN 20, 2008 10:08am ET

Related Links

Predictive Modeling Making Insurer Inroads
February 8, 2012
CA Takes Data Model to the Cloud
February 2, 2012
Tableau Twists Platform for More Sharing
January 19, 2012

Web Seminars

Why Getting Started in MDM Doesn't Have to Be Difficult
February 29, 2012
How to Narrow the IT/Business Communication Gap
March 21, 2012
Deliver Better Enterprise Data through Better Reference Data Management
Available On Demand

The Ins and Outs of Imperfect Data

Print
Reprints
Email

For those who make their living working with and leveraging structured data in enterprise applications, the stone-cold reality of today’s landscape is this: corporate data is inherently imperfect.

 

This may not seem like a revelation to those in the trenches who understand the nature of enterprise databases and the methods of actually collecting data into these applications. Yet organizations of all sizes are negatively impacted by imperfection, duplication and inaccuracy in the data they use to make business-critical decisions every day - without understanding the harmful effects of this data. In an ideal world, all the structured data that companies use to operate and make critical decisions would be perfect. But in the real world, it just doesn’t work that way.

 

For those companies that understand the inherent imperfection of structured corporate data, many different approaches are taken and significant resources devoted to cleaning and standardizing data. While some companies are successful at significantly improving the quality of their data, it is practically impossible to reach a point where all structured data is perfect and stays perfect as it is used and updated.

 

The simple fact is that imperfect data needs to be made usable despite its inherent imperfections. This in turn will create opportunities to use data to benefit the business without concern for spending valuable time and resources throwing “solutions” at the problem that simply don’t work. Ultimately, companies that are able achieve this will realize more value from working with imperfect data, as the associated risks are mitigated and costs are reduced.

 

The Root of the Problem

 

Before addressing the problem of data imperfection, it’s important to understand the root of the problem in order to design a solution that works.

 

In the past, before corporate data was stored and managed by databases, organizations such as enterprises, hospitals and government agencies had departments dedicated to managing paper-based records and files - let’s call it “the old-fashioned way.”

 

When dealing with these files, humans were on hand to file and manage all of the data within individual records. They also were able to recognize the natural variations and nuances that occurred within the data. For instance, if one record had a patient listed as Stefanos Damianakis and another had written Stafano Damianekis, a person filing the data would interpret this inconsistency and determine whether the two were referring to the same person.

 

Fast forward to 2008 and the landscape looks completely different. In the modern corporate environment, structured data is managed entirely via database applications, which by design only recognize exact matches. This can be enormously efficient because humans can’t match the speed of a computer no matter how quickly they’re rifling through information. But, with this speed comes a significant limitation - only exact matches are possible.

 

As another example consider a hospital that maintains a database of patients, many of which have visited multiple times. On one of these visits, the hospital employee charged with looking up patient data is unable to find the person’s name in the database, because he or she accidentally misspelled the person’s name in the search process. As a result, the employee creates another entry in the database, resulting in a duplicate record that lacks vital information about the patient’s history. What happens when the patient then returns for a critical procedure, but the duplicate record is used to provide the doctor with his or her information? What if the person is allergic to a particular anesthesia? Suddenly imperfect data goes from a simple business issue to literally life-or-death situation.

 

When it comes to recognizing and dealing with inconsistencies and errors in structured data sources, the previous example is where traditional rules-based systems break down. How is the software going to function correctly despite the differences in enterprise data. The speed delivered by a computer program may seem desirable at the outset of implementation, but it will ultimately create more problems for the organization in the long run.

Advertisement

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.