Q: We have almost completed the first iteration of our data warehouse base facts and dimensions. Our ETL developers use templates to assure consistency and accuracy in their code, and we have a very good QA process in place. It has been suggested that we should convert to an ETL tool instead of continuing on our current path. Given the amount of time and work that has already been invested into what seems to be a reliable process, is this a reasonable thing to do at this point in the project?
Majid Abai's Answer:
To buy or to build, that's the question! Considering the fact that you are almost done with the first phase of your DW, and that you have implemented methods and processes to assure data consistency, one would suggest to continue with the current plan. However, I suggest that you also consider the following conditions prior to making a decision:
- How many developers are working on the ETL tool now? How many development hours will be spent until the completion of the project?
- How many future versions will the project have? And how much work has to be done to the ETL process to modify it?
- How much work is needed to maintain the system after the original development project (all phases) are completed?
- Is this the only DW project for the organization or will there be other projects?
- If you decide to choose an Off-the-shelf ETL tool, could all current ETL developers take on other projects that are on-hold due to this DW project?
- Are there any SOX compliance directives in your organization that suggest utilization of tools as opposed to built-in-house applications?
- Has your organization allocated the capital budget within this fiscal year to purchase an ETL tool?
- What is the urgency of business users on the DW? Are they willing to wait for the process of selection, implementation, and customization of a tool now that they are so close to receiving their first version of a DW?
Prior to making a decision, please consider above conditions. Also, please consider that in some cases, built-in-house applications turn out to be maintenance nightmares due to the amount of time and resources needed to maintain, modify and upgrade them. In addition, ETL tool companies have tens if not hundreds of developers implementing the cumulative needs of their respective user communities. On the other hand, ETL tools are neither cheap to buy nor easy to implement and lack the customization level provided by a built-in-house application. Considering all the above factors, you should be able to analyze the two paths and choose the right approach for your organization.
Sid Adelman's Answer:
I wonder, who suggested the conversion? If your ETL process (without the ETL tool) is working and you don't anticipate many changes, leave it alone. There's plenty of other work that needs doing.
Tom Haughey's Answer:
Many organizations successfully use what I call "roll your own" (RYO) code to do ETL. This can be a very effective solution in many situations. The advantages of ETL tools are as follows:
Retention of metadata. This is a major point because analytical applications are highly dependent on proper understanding of metadata.
Ease of use. Because most ETL tools are GUI based and have repositories, they have increased ease of use and ease of modification.
Built-ins. They have built in objects to handle recurring tasks such as aggregation, so these do not need to be coded and recoded.
Skill: Because of the above factors, the skill level requirements for ETL tools are less than with SQL.
Support. There is a large experience base to fall back on. This includes customer experiences and vendor support.
There is no such thing as a free lunch, so ETL tools have some disadvantages, among them:
- Cost. They are costly and have large technology and space requirements.
- Complexity. ETL tools may sometimes have difficulty with very complex transformation logic, as well as with complex staging requirements.
- Performance. Because they are generic, and many of them are interpretive, there can sometimes be performance issues over SQL, for transformations.
If you did not have all this code and other templates in place, I would recommend going to the ETL tool However, in my opinion, as long as you handle metadata efficiently in your current environment, I would consider sticking with your current RYO solution.This point needs emphasis. Good metadata (in whatever form it takes) is essential to the usability of a data warehouse. As long as this is provided, I see no immediate benefit in going to an ETL tool. However, consider also your future plans and the growth plans for the DW. If you anticipate a large growth in BI applications and in data sources, then conversion to an ETL too would be a sound investment.
Chuck Kelley's Answer:
I generally lean toward the ETL tools mainly for two reasons: 1) cost of maintenance (and new development) and 2) metadata. There will be a learning curve in using specific tools, but after the curve, I have found that development picks up. This is true for most new technologies. The biggest reason is metadata. Most ETL tools have metadata that can be used. When developers code stuff, there still is the time factor (and quality) of getting them to document (including changes over time) and then tying them together. Most ETL tools have that as a built-in process.
I realize that there will be a (major) outlay of capital to purchase a top quality tool, but I have found in the long run that the tool provides us a more robust development and metadata environment.
Evan Levy's Answer:
Well, if by templates you're referring to your ETL developers having reusable code templates (COBOL, Perl, etc.) then there are some benefits to leveraging an ETL tool. It's good to hear that the development team has already embraced the value of code development rigor. It inevitably ensures higher quality code, fewer defects, and provides an environment for new developers to get up to speed quickly.
I think you'll find a few additional benefits with the adoption of an ETL tool. We've seen quite a few of our traditional coding clients find significant productivity increases due to the benefits that the ETL tool's integrated development environment brings (code testing, data profiling, debugging, etc. )
The tools also enable developers to respond to other situations more effectively due to the functionality enhancements included within the ETL toolset.
- Impact assessment associated with source data changes. If used correctly, the metadata features allow developers to quickly determine the impact of source system data changes. The metadata system can assist the developer to quickly quantify the number of occurrences of a particular host data value - something that's nearly impossible with your coding templates.
- Data profiling to analyze the source data. The data profiling function can trace and track the actual data value occurrences passing through the ETL code. This helps ensure that the ETL code can respond to any potential data value occurrence . Your developers would have to write a significant amount of additional code if tracing or profiling data became a requirement.
Clay Rehm Answer:
This question comes up quite a bit. The first step is to find out why an ETL tool was suggested. Have the person or persons provide pros and cons of replacing your current process with a new ETL tool. Additionally, was a specific ETL suggested? Does someone have experience with a specific ETL tool and do they want to introduce it into your company?
Is the programming language you used to develop the ETL process being used on other development efforts as well? Meaning, do you have enough people in your shop with the necessary experience on this current development language to appropriately support it?
The biggest reason an ETL tool is suggested is for the meta data capabilities it provides. However, someone is required to key this data in, such as the business rules for derived fields. This forces you to track the data in a tool rather than it being stored in someone's head.
If you purchase an ETL tool, you will need staff to be trained in it and to maintain it, just like for any other development tool. Purchasing an ETL does not get you away from that.
Another reason an ETF is suggested may be ease of use or performance. Most ETL tools are very visual and make the straight mapping of source to target much easier to create and maintain, and the business rules are more prevalent.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access