We can have a staging area between extract and transform or we can have staging area between transform and load. I would like to know three to four advantages and disadvantages of each.
Larissa Moss’ Answer: This is not an either/or situation. How you perform ETL staging depends on a number of things.
Your data quality. If your data is dirty, you may have to read additional source files/databases to perform your edit checks and your data correction process.
Les Barbusinski’s Answer: You need both. The first staging area (i.e., between the extract and transformation processes) provides the following benefits:
The second staging area (i.e., between the transformation and load processes) provides the following benefits:
Hope this helps.
Clay Rehm’s Answer: My answer would be "it depends." I don’t think there is any right or wrong answer or approach I think it depends on your organizations business rules and practices that would dictate how data should be staged and how long it should be backed up/archived.
Scott Howard’s Answer: You should consider that a staging area exists between each layer of your ETL architecture. The output of one layer is input to the subsequent layer. For example, the output of transform is obvious the input to load. However, you’ll need to consider the costs of physically instantiating that data as compared to directly piping data from one process to the next if possible. UNIX pipes and POSIX batch pipes come to mind and should be utilized if practical.
Now don’t limit yourself to only the extraction (E), transformation (T) and load (L) layers. I run an ETL workshop that quickly establishes that ETL alone as an architecture is inadequate. You need to further identify unique process layers within the traditional ETL architecture that offer a much more detailed and fine grain mapping of all the talks that need to be performed in sequence to accomplish your ETL task. For example, one of my customers identified source extract, pre-formating for consistency and data harmony, filtering, intelligent merge, delta detection, cleansing, standard transformation, build and finally load as unique service layers. Only after identifying these tasks can you effectively implement consistent and integrated ETL processes regardless of your ETL tool and tool sets. Consider now that you may need to manage seven staging areas if this highly recommended approach is embraced.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access