Question: My question is twofold, but related to the use of tools as opposed to code. We have a small business intelligence (BI) team, and the members are very technical and very good analysts and developers. They do not believe that an extract, transform and load (ETL) or a data modeling tool makes sense in our environment. Could you please provide your views on the advantages and/or disadvantages of the use of a tool for ETL and data modeling versus code to provide the same functional capabilities?

 

Tom Haughey’s Answer: This is really three entirely different questions, so I will address it as such.

 

1. Do you need an ETL tool or can you write your own ETL?

 

If an organization already has a large body of transformation programs written in COBOL or some other language, then I would stick with that strategy for now. If you have such a body of code, you probably have either procedures in place or a code generator that simplifies code creation. Many organizations successfully use what I call “roll your own” (RYO) code to do ETL. This can be a very effective solution in many situations. The advantages of ETL tools are:

  • Retention of metadata. This is a major point because analytical applications are highly dependent on proper understanding of metadata.
  • Ease of use. Because most ETL tools are GUI based and have repositories, they have increased ease of use and ease of modification.
  • Built-ins. They have built in objects to handle recurring tasks such as aggregation. So, these do not need to be coded and recoded.
  • Skill. Because of the above factors, the skill level requirements for ETL tools are less than with SQL.
  • Support. There is a large experience base to fall back on. This includes customer experiences and vendor support.
  • Auditing. ETL tools assist with auditing because of their repositories and their ability to preserve versions.

If you do not have an existing body of ETL code and other templates in place, I would recommend going to the ETL tool. However, in my opinion, as long as you handle metadata efficiently in your current environment, I would consider sticking with your current RYO solution. This point needs emphasis. Good metadata (in whatever form it takes) is essential to the usability of a data warehouse. As long as this is provided, I see no immediate benefit in going to an ETL tool. However, consider also your future plans and the growth plans for the DW. If you anticipate a large growth in BI applications and in data sources, then conversion to an ETL too would be a sound investment and should be added to your DW strategy.

 

2. Do you need a data modeling tool in general?

  • There is no question about it. If you intend to be a serious data management organization, or especially a world-class data management organization, there is no other way. I’m not saying which one, or even which type of one, or which features you need, but you absolutely need one or you cannot maturely deal with management of data and with database design. A smaller organization may be able to survive well with a simpler tool like Visio; a larger more mature organization needs a tool like PowerDesigner, ERwin, ERStudio, Visible or others.

3. Do you need a data modeling tool for ETL?

  • It depends on how you plan to do staging and transformation in your ETL. It is always useful to model any data structure, whether it is a permanent data store or a temporary data structure. Some ETL staging structures have special purposes in that they are not used for transaction processing or reporting but for the transformation and preparation of data. It is often acceptable to use various file structures rather than relational tables. There may be a good deal of inevitable redundancy and variability to these file structures, for example when the data is being sorted and aggregated. Consequently, data modeling may not seem like a natural choice. But I stick to my earlier point, it is always useful to model any data structure.
  • However, some structures really should be modeled, such as, those that are used in RI checking, shaping history, generating aggregates and managing surrogate keys. These should be modeled because there are significant relationships within these structures and between them and the output tables.

Sid Adelman’s Answer: Data modeling tools are inexpensive and writing code to provide the same function is pointless. You have too much work to do to try to duplicate this capability.

 

With ETL tools, it depends on the complexity of your cleansing and transformations and the volatility of expected changes in the source and the target. If your ETL process is, as Larissa Moss puts it, just suck and plunk, you don’t need an ETL tool. If you plan to make significant transformations, aggregations and cleansing, or if your sources are constantly changing and you are making serious changes to your data warehouse (DW) database, you will want an ETL tool and some ETL tools can help with performance when you are loading very large tables.

 

Chuck Kelley’s Answer: Most likely they are young and/or do not do much documentation. Almost every site I go into (or have been to) suffers from the same view (ETL more than data modeling). Ask them why they are using SQL or C or Java or any other language for that matter. Why should they not just write it in machine code? The answer is that while in the short term it might take longer to develop, maintenance and documentation will be easier moving forward. Unless they want to be maintenance programmers the rest of their lives, I would think you should get them to start using a tool.

 

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access