Continue in 2 seconds

Would you recommend deleting inconsistent data in our sources?

  • Sid Adelman, Chuck Kelley, Clay Rehm, Adrienne Tannenbaum
  • November 09 2006, 1:00am EST

Q: Some of the data in our sources is inconsistent. Would you recommend deleting it, and would the deletion be best done in the ETL (extract, transform and load) process?

Sid Adelman's Answer:

You need to find out what the data owner wants to do with this inconsistent data. The data owner may want it available even though it is inconsistent.

Chuck Kelley's Answer:

It depends on whether you need that data or not. If the data is not needed, then you can delete it. However, if it is not needed, why is it in your data structures? I would, therefore, either fix the data in the source system and allow the data to move into the data structures or delete the row in the source system and have that drive the "deletion" from the data structures.

Clay Rehm's Answer:

I would not delete the source data. I would involve your business users to identify and correct the data. If this is not possible, or time does not permit this to be done, then your last resort would be to remove the data only after it has been backed up and documented, and this information and direction has been approved by your users and management.

Adrienne Tannenbaum's Answer:

Here's the first question: How do you know which data is "right"? That will totally determine what to do with the data that is "wrong."

Here's the second question: Does the "right" data have "right" data associated with it (logically and physically - such as a purchasing record for the "right" customer)? That also determines your answer.

Here's the third question (which I put my bets on): Does the "right data" sometimes have "right data" associated with it and sometimes have "wrong data" associated with it? Are the answers inconsistent across the sources? I think your answer is probably "yes."

If this is the case, then you must set up a "master" or "correct" view. Your ETL process would work on converting all of the "wrong data" to this view, if possible - you determine what specific fields must be correct within an incoming record, and then the ETL process can evaluate and determine what is "wrong" or "not worth it."

In general, any data that is too "wrong" for inclusion or for translation is flagged and kept aside. Someone is usually responsible for dealing with this data, and makes the determination whether to correct or delete.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access