Q:

I would like to know how best to test the warehouse in ETL if we are not using any tools for ETL (everything has been written in programs)? If we are using ETL tools like Informatica, how do you draw up test cases for the ETL process?

A:

Sid Adelman’s Answer: Whether or not you are using an ETL tool, you need to test and validate that the data has been created correctly. There are a few areas that can and should be validated. If your source data had a million rows, and if you kept them all, the data warehouse should also have a million rows. If the source system has $3,684,679.96 in sales for Region 6, the data warehouse should have that exact dollar amount. If the source system has 85,832 customers in Region 3, the data warehouse should have that exact number of customers. This testing and validation is not a one-time activity. It needs to be performed every time the ETL process is run. Some of the ETL tools have these validation capabilities.

David Marco’s Answer: You draw up test cases for a custom ETL process in the exact same way you do for a tool-driven ETL process. A good test states the beginnings state of the data/process, the processing that should occur and then the end state of the data/process. Make sure to construct test cases for fatal problem errors and for duplicate data.

Clay Rehm’s Answer: Testing has absolutely nothing to do with what tool you use. I am sure you will get differences of opinion here, but the fact of the matter is that no matter what tool you used to load the database, you need to be able to test and validate the data loaded. Period. An ETL tool is an ETL tool. Testing is such an overlooked and underrated function of the data warehouse that books could be written on “data warehouse testing – preventing failure”. Also, ETL means Extract, Transform and Load. This means you can write ETL in any programming language – COBOL, SAS, VB, C, UNIX Shell script, BAL, etc. You get the point. It is that some languages or applications are better than others in terms of ease of use and the other functionality it provides. The biggest benefit of an “official” ETL tool is that it automatically creates meta data. Somewhat cryptic meta data, but it is meta data and it is stored in a relational database management system. Drawing up the test cases requires a test strategy, a test plan, test scripts, testers (people who are committed to the project and who will actually test), documentation, and oh, more documentation. I noticed you did not specify what type of testing you want to perform. Are you going to development test, unit test, system test, integration test, acceptance test, regression test, performance test, etc??? Most data warehouse projects do not include time in their project plans/schedules for the time needed to iteratively test and retest and get the darn thing right. Development testing is made up of unit testing and the combination of units. Unit testing examines individual modules, subroutines or procedures. After the units have been tested individually, they are combined or integrated into larger and larger objects until the complete function has been built. System testing determines if a system satisfies its requirements and if applications within a system operate effectively as a whole. System testing verifies the functional, performance, interface, security and other system requirements in a simulated production environment. This includes system acceptance testing of third-party software and includes performance and regression testing. Acceptance testing verifies that a completed system meets the original business objectives and that the client finds the production version acceptable for use by the intended audience. End to end testing assesses the extent to which the interrelated data warehouse systems support a core business process or activity in an operational environment. This testing covers the complete data warehouse and ensures that any data that is passed anywhere in the system is passed with appropriate values.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access