Q: Some of the data in our sources is inconsistent. Would you recommend deleting it, and would the deletion be best done in the ETL (extract, transform and load) process?
Sid Adelman's Answer:
You need to find out what the data owner wants to do with this inconsistent data. The data owner may want it available even though it is inconsistent.
Chuck Kelley's Answer:
It depends on whether you need that data or not. If the data is not needed, then you can delete it. However, if it is not needed, why is it in your data structures? I would, therefore, either fix the data in the source system and allow the data to move into the data structures or delete the row in the source system and have that drive the "deletion" from the data structures.
Clay Rehm's Answer:
I would not delete the source data. I would involve your business users to identify and correct the data. If this is not possible, or time does not permit this to be done, then your last resort would be to remove the data only after it has been backed up and documented, and this information and direction has been approved by your users and management.
Adrienne Tannenbaum's Answer:
Here's the first question: How do you know which data is "right"? That will totally determine what to do with the data that is "wrong."
Here's the second question: Does the "right" data have "right" data associated with it (logically and physically - such as a purchasing record for the "right" customer)? That also determines your answer.
Here's the third question (which I put my bets on): Does the "right data" sometimes have "right data" associated with it and sometimes have "wrong data" associated with it? Are the answers inconsistent across the sources? I think your answer is probably "yes."
If this is the case, then you must set up a "master" or "correct" view. Your ETL process would work on converting all of the "wrong data" to this view, if possible - you determine what specific fields must be correct within an incoming record, and then the ETL process can evaluate and determine what is "wrong" or "not worth it."
In general, any data that is too "wrong" for inclusion or for translation is flagged and kept aside. Someone is usually responsible for dealing with this data, and makes the determination whether to correct or delete.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access