Q:

How important it is to have a separate ETL tool when our target RDBMS supplies ETL capabilities, e.g., PL/SQL in Oracle and DTS in MS SQLServer?

A:

Sid Adelman’s Answer: The question to ask is what are the benefits of an ETL tool over what is natively supplied by the RDBMS vendor. You need to decide if the greater productivity in creating and maintaining the ETL process, the cleansing provided, the administrative aspects including verification of the ETL process completing correctly and other functions are worth the money and time required to learn the ETL tool. Les Barbusinski’s Answer: The short answer is yes. Oracle can scale to handle a multiterabyte data warehouse from both an ETL and OLAP performance perspective. Obviously, there are trade offs to optimizing database performance for ETL vs. for OLAP queries. Some of the things to consider in maintaining a VLDB Oracle databases are:

  • Avoid logging if possible (use EMC BCV mirroring for backup/recovery)
  • Always partition large tables (>1 Million rows)
  • Truncate and reload a partition rather than performing mass DELETEs
  • Avoid updates if possible (especially on tables with bitmap indexes)
  • Use SQL*LDR rather than programmatic INSERTs whenever possible
  • Use “materialized views” whenever possible to build aggregates and distributed data mart tables

Michael Jennings’s Answer: The ETL tool provides methods in meta data capture/integration, maintenance and documentation. From a maintenance standpoint, the ETL tool comes with a GUI interface for the developer that provides editing, prompting, syntax checking, DBMS access, security and project organization (directory/file/component) typically from a single access point. The ETL tool will typically come with some form of meta data repository structure to capture source, target and mapping meta data (technical, business, operational). Meta data capture and integration with other tool sets (e.g., reporting) is an area where the ETL vendors still have many opportunities for growth. Finally, most ETL tools on the market have various means of documenting the ETL project and processes in a graphical form and through prebuilt reporting templates.

Chuck Kelley’s Answer: I don’t consider PL/SQL an ETL tool. While many data warehouses use PL/SQL to do the ETL capabilities, it does not provide any meta data and works only with Oracle (with the exception of using database links, etc. which is not trivial and can have performance problems). DTS provides the functionality required of an ETL tool (as does it quite well) and supply meta data. DTS is a separate tool that is packaged with SQL Server. If it provides all the functionality that you need, then use it. The most important thing when choosing an ETL (IMO) is that it provides the functionality that you need to be successful in the building of your data warehouse. So before you start looking, please make a list of the needed features and then match the product(s) to your list.

Larissa Moss’ Answer: The T in ETL stands for a lot more than just translating data type and length from source to target and translating a few codes. It is not enough to address only the technical data conversion rules, the ETL tool must also have the capability to handle the data domain rules and the data integrity rules. Therefore, your ETL tool evaluation process must be driven not only by your data conversion requirements but by your data domain and data integrity requirements as well. Data domain rules address transformations for:

  • Missing data values.
  • Default values, e.g. 0, 999, FF, blank.
  • Intelligent "dummy" values, which are specific default (or dummy) values that actually have a meaning, e.g., Social Security Number of 888- 88-8888 being used to indicate that the person is a non-resident alien.
  • Logic embedded in a data value, e.g., using lower-valued ZIP codes (postal codes) to indicate a state on the east coast, such as 07456 in New Jersey; and higher- valued ZIP codes to indicate a state on the west coast, such as 91024 in California.
  • Cryptic and overused data content, e.g., the values "A, B, C, D" of a data element define type of customer, while the values "E, F, G, H" of the same data element define type of promotion, and the values "I, J, K, L" define type of location.
  • Multipurpose data elements, i.e., programmatically and purposely redefined data content; the most obvious example being the "redefines" clause in COBOL statements.
  • Multiple data elements embedded in or concatenated across, or wrapped around free-form text fields, e.g., Address Lines 1 through 5 containing name and address data elements: Address Line 1: Brokovicz, Meyers, and Co ;Address Line 2 hen, Attorneys at Law; Address Line 3 200 E. George Washington ;Address Line 4 Boulevard, Huntsville; Address Line 5 OR 97589

Data integrity rules address transformations for:

  • Contradicting data content between two or more dependent data elements, e.g., Boston, CA (instead of MA).
  • Business rule violation, e.g., date of birth = 05/02/85 and date of death for the same person = 11/09/71.
  • Reused primary key (same key value used for multiple object instances ), e.g., two employees with the same employee number.
  • No unique primary key (multiple key values for the same object instance), e.g., one customer with multiple customer numbers.
  • Objects without their dependent parent object, e.g., job assignment points to employee 3321, but there is no employee 3321 in the employee database.

Depending on how many of these data domain and data integrity violations your source data exhibits, and depending on the capabilities of your current ETL tools, you should be able to determine whether or not you need to buy another separate ETL tool.
Clay Rehm’s Answer: If the ETL tool that is provided in your Oracle or Microsoft SQL Server suite is providing the functionality you need and desire, then there is no reason to have yet another ETL tool in your toolbox. Some key functionality traits would include integrated meta data and the ability to make calls into other programming languages and tools.

However you will need to investigate the strengths and weaknesses of your current tools versus any new tool so you can rest at night and prove to others that you did indeed pick the right tool based on the information you had at the time.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access