Evaluating ETL Tools
Information Management Special Reports, July 2002
Extraction, transformation and load (ETL) tools play a major role in the formation and maintenance of the enterprise warehouse. They capture most, if not all, of the business rules and validation criteria required for transformation into a meta repository; they organize the ETL process; and they control the flow of data out of and into the various data stores. Evaluating and choosing an ETL tool should be one of the first things you do when you consider building a data warehouse.
Where to Start
Advertisement
Sample Background. A high percentage of the warehouse development effort is spent on the ETL process. The process of taking transactional data designed for operational systems and putting it into a format useful for querying, reporting and mining is complicated. It involves the application of complex business rules to resolve data inconsistencies, ambiguities and redundancies. More often than desirable, enterprises developed systems using custom coding techniques. The effort for building and implementing integrated programs can become a technical and organizational odyssey. Some of the problems with this approach include:
- Inconsistent application of the business rules and constraints
- Incomplete, inconsistent and outdated documentation and meta data
- Increase in defects
- Longer test cycles to insure accuracy
- Increased time to make changes
ETL tools help to alleviate these problems by providing a consistent, easy-to-use solution to this recurring situation. ETL vendors offer integrated toolsets specifically designed to provide consistent and accurate transformations that are well documented, take little time to enable and increase the speed to market.
Sample Business Drivers. While the existing method may be capable of performing the necessary ETL functions, it is usually lacking in one or more of theses areas: a high performance interface to the data warehouse, hand- coded applications created to populate the target tables and limited meta data capture. This leads to the inability to assess the impact of a requested change.
Choosing an ETL tool can provide consistent meta data, reduce training, increase expertise and provide efficient management. All of which increases product quality, time to market, customer satisfaction and reduces total cost.
The savings in human resources assigned to the ETL tasks should offset the cost of the ETL tool.
Sample Project Objectives include: recommending an ETL product set that provides construction and maintenance benefits that outweigh the cost of the toolset; defining the total cost of ownership over at least three years; and defining the implementation plan.
Tool Architecture
Determine what types of warehouse structures you will be supporting with the ETL tool. Are they ROLAP, MOLAP, HOLAP, an ODS, a combination or all of them? Next, survey the possible source systems and take an inventory of the different data stores. Are they housed in databases; for example, DB2, Oracle, SQL Server, or Sybase? Are they in ODBC sources?
It’s a good idea to draw a model of this ETL environment so you can visualize what it’s going to look like. Figure 1 represents an example of this kind of model.

Figure 1: ETL Environment
Request for Information
Once you have the model defined you should make a list of all the criteria an ETL tool should satisfy to support your warehouse environment. After you have completed and verified the criteria, create a request for information (RFI) document that contains the following topics:
- Instructions for response
- Project or environment description
- Evaluation criteria
- Contact information
Since there are so many ETL vendors, narrow the list by comparing your needs with the product features. The Data Warehousing Information Center has a Web site with a list of vendors located at http://www.dwinfocenter.org/clean .html.
Approach
Send the RFI to a selected set of vendors. Give them three weeks to respond. Compile the results and build the evaluation matrix. Interview the vendors to clarify your understanding and refine the matrix. Choose one or two vendors to evaluate empirically. Conduct the evaluation based on critical needs. Publish the results and recommendations. Implement the software – a project in itself not covered here.
The information presented below is a sample RFI.
Sample Instructions for Response:
- Immediately notify
that a response is forthcoming - Submit questions regarding the RFI to
- Send response in electronic format to
Sample Project Description
Build a large (2 terabyte) data warehouse in phases based on prioritized business requirements on subject areas. The design is based on ROLAP and is intended to service analytical queries submitted by business users as well as service reporting needs, modeling and eventually support data mining.
The ETL tool is responsible for transforming data from the source systems and external sources into the data warehouse and data marts. See Figure 1 for additional information.
The evaluation should address the following:
- Description of present and future situation (multi-year ETL usage plan and major application system acquisition plan).
- The evaluation criteria for the proof of concept (see Figure 2).
- Evaluation schedule.
- Service level agreement (SLA) requirements.
- Quality of service (QoS) requirements.
- Timeline for the evaluation.
- Vendor technical support.
- Cultural expectations.
Page 1 of 3.






