Q: We have almost completed the first iteration of our data warehouse base facts and dimensions. Our ETL developers use templates to assure consistency and accuracy in their code, and we have a very good QA process in place. It has been suggested that we should convert to an ETL tool instead of continuing on our current path. Given the amount of time and work that has already been invested into what seems to be a reliable process, is this a reasonable thing to do at this point in the project?
Majid Abai's Answer:
To buy or to build, that's the question! Considering the fact that you are almost done with the first phase of your DW, and that you have implemented methods and processes to assure data consistency, one would suggest to continue with the current plan. However, I suggest that you also consider the following conditions prior to making a decision:
- How many developers are working on the ETL tool now? How many development hours will be spent until the completion of the project?
- How many future versions will the project have? And how much work has to be done to the ETL process to modify it?
- How much work is needed to maintain the system after the original development project (all phases) are completed?
- Is this the only DW project for the organization or will there be other projects?
- If you decide to choose an Off-the-shelf ETL tool, could all current ETL developers take on other projects that are on-hold due to this DW project?
- Are there any SOX compliance directives in your organization that suggest utilization of tools as opposed to built-in-house applications?
- Has your organization allocated the capital budget within this fiscal year to purchase an ETL tool?
- What is the urgency of business users on the DW? Are they willing to wait for the process of selection, implementation, and customization of a tool now that they are so close to receiving their first version of a DW?
Prior to making a decision, please consider above conditions. Also, please consider that in some cases, built-in-house applications turn out to be maintenance nightmares due to the amount of time and resources needed to maintain, modify and upgrade them. In addition, ETL tool companies have tens if not hundreds of developers implementing the cumulative needs of their respective user communities. On the other hand, ETL tools are neither cheap to buy nor easy to implement and lack the customization level provided by a built-in-house application. Considering all the above factors, you should be able to analyze the two paths and choose the right approach for your organization.
Sid Adelman's Answer:
I wonder, who suggested the conversion? If your ETL process (without the ETL tool) is working and you don't anticipate many changes, leave it alone. There's plenty of other work that needs doing.
Tom Haughey's Answer:
Many organizations successfully use what I call "roll your own" (RYO) code to do ETL. This can be a very effective solution in many situations. The advantages of ETL tools are as follows:
Retention of metadata. This is a major point because analytical applications are highly dependent on proper understanding of metadata.
Ease of use. Because most ETL tools are GUI based and have repositories, they have increased ease of use and ease of modification.
Built-ins. They have built in objects to handle recurring tasks such as aggregation, so these do not need to be coded and recoded.
Skill: Because of the above factors, the skill level requirements for ETL tools are less than with SQL.
Support. There is a large experience base to fall back on. This includes customer experiences and vendor support.
There is no such thing as a free lunch, so ETL tools have some disadvantages, among them:
- Cost. They are costly and have large technology and space requirements.
- Complexity. ETL tools may sometimes have difficulty with very complex transformation logic, as well as with complex staging requirements.
- Performance. Because they are generic, and many of them are interpretive, there can sometimes be performance issues over SQL, for transformations.
If you did not have all this code and other templates in place, I would recommend going to the ETL tool However, in my opinion, as long as you handle metadata efficiently in your current environment, I would consider sticking with your current RYO solution.This point needs emphasis. Good metadata (in whatever form it takes) is essential to the usability of a data warehouse. As long as this is provided, I see no immediate benefit in going to an ETL tool. However, consider also your future plans and the growth plans for the DW. If you anticipate a large growth in BI applications and in data sources, then conversion to an ETL too would be a sound investment.
Chuck Kelley's Answer:
I generally lean toward the ETL tools mainly for two reasons: 1) cost of maintenance (and new development) and 2) metadata. There will be a learning curve in using specific tools, but after the curve, I have found that development picks up. This is true for most new technologies. The biggest reason is metadata. Most ETL tools have metadata that can be used. When developers code stuff, there still is the time factor (and quality) of getting them to document (including changes over time) and then tying them together. Most ETL tools have that as a built-in process.









