Claudia would like to thank Joyce Norris-Montanari for her assistance in writing this month's colum.
If you are embarking on your first data warehouse project, at some point you will choose whether to purchase or grow your own extraction, transformation and load (ETL) software. The right decision depends on your situation.
ETL products are getting cheaper and easier to use every day. In addition, they are becoming more complex, robust and plentiful. The truth is that they can dramatically reduce the time required for maintenance and upkeep of the ETL environment. So, why would anyone grow his or her own?
Although data warehousing is not new, the success of your maiden project is not at all guaranteed. Even if you have all your ducks in a row, your project may still be constrained by a number of factors which may limit your ability to purchase off- the-shelf ETL software.
For one, you may be doing a proof of concept. If you are, then you won't have time for the overhead of evaluating, selecting, purchasing, installing and learning an ETL package. What's more, you probably have a programming staff that is already familiar with your legacy database and operating environment.
More importantly, you may not have the budget to purchase your ETL software. The first iteration will usually include capital budget items that will survive to support subsequent phases. Many times, the initial costs of the data warehouse hardware and software leave little funding available for nice-to-haves such as packaged ETL software.
If you are unable to buy an ETL product for whatever reason, consider these points when building your own:
Many operational source systems include unload utilities. Make an inventory of what you have available and evaluate them for ETL use.
You can often utilize legacy database logs to control change data capture.
Many database systems include load utilities. Use them where you can.
- In all cases, make sure your programmers document the transformation rules employed during the first iteration (more than just COBOL code!).
Later, after a successful first iteration, your custom ETL environment will probably grow in complexity. Be sure to monitor the amount of time consumed by ongoing ETL maintenance and enhancements. These measurements will be a great help when you need to justify the ETL tool purchase.
In preparation for a conversion to a packaged ETL tool, familiarize yourself with the latest selection of products, get some references from other organizations that are running that tool in an environment similar to yours and determine whether the tool will save you enough time in the long run to justify the purchase.
Consider having the vendor do a proof of concept for you. The proof of concept should have a very narrow scope (three complex algorithms), should last about three days and should be free.
If you are certain that a purchase is justified, consider these points when evolving into full use of your new tool:
Take the introductory class for the product and work through any tutorial.
- Become very familiar with any debugging and logging features offered by the product.
Read the developer's guide and any other documents offered by the software vendor.
Use the product to deliver information to the new data marts to a relational database management system for a star schema, a multidimensional database management system, a statistical subset (flat file) of data or a proprietary exploration database.
Use caution when using the scheduling system, if available, to call existing programs or procedures. This may not be the most efficient way to launch the ETL programs they may run faster in their native environment.
Look for any areas in the current processes where the new ETL product can improve efficiency. For instance, start eliminating the staging area in the relational database management system and start using the file structures within the product.
Use enhancements such as loading files into memory for lookup or decode tables.
Get a good understanding of the meta data within the product and identify uses for administrative, technical and business users.
Look for reusability of modules within the product.
Take advantage of the many built-in transformation routines within the product.
Convert the remaining delivery of data marts into the product. Older data marts, that will eventually go away, may best be left alone until they are no longer needed. At that point, you can delete the programming modules and database structures that make up the data mart and delivery process.
Take any advanced classes offered on the product.
- Take advantage of any parallelism features offered by the product.
Slowly convert the hand-coded acquisition programs to take full advantage of the features of the ETL product. As changes are required, convert these programs and file structures to work within the ETL product.
In summary, neither building nor buying is better by default. If you have to build first, do not forget the benefits of buying. Your ability to react quickly to changes in your corporate information factory (CIF) environment will improve your success; and packaged ETL tools, the plumbing of the CIF, often enhance your reaction time. Finally, if you are planning to buy an ETL tool sometime in the future, the longer you delay the purchase, the harder it may be to justify it and the more difficult the conversion.
Joyce Norris-Montanari is director of Technology Services for Intelligent Solutions, Inc. She may be contacted at JMontanari@IntelSols.com.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access