Q: Please send me info about how to test a data warehouse - beginning with the act of defining requirements up to the point of design. Do I need testing software or can it be achieved by means of test cases? What kinds of test cases would you recommend?

Sid Adelman's Answer:

You obviously need to test your extract, transform and load (ETL) process as you develop it, and you need to, not really test, but verify the results each time the ETL process runs and the ETL tools have tie-outs that will verify that the database loaded correctly. In addition, the folks who run the queries and reports should always validate their results. Make sure you have a validation module in your user training course.

Clay Rehm's Answer:

To perform testing correctly and thoroughly takes some time and effort. You can certainly find some testing templates on the internet but this does not guarantee success.

The best test cases come from your detailed requirements documents created during the planning and analysis phases (inception and elaboration in the RUP world). Each and every requirement that is documented must be measurable, achievable and testable. Each requirement must be assigned to someone who wants it, who will implement it and who will test and measure it.

Test cases will be determined by the nature of the project. The best test cases are developed jointly between the business users and the developers on the project.

Anne Marie Smith's Answer:

Generally, testing is testing, regardless of the system to be tested. The steps of testing apply:

  1. Requirements Definitions
  2. Risk Analysis - what can break and how and why
  3. Test Methods and Techniques
  4. Test Plans and Procedures
  5. Test Cases - use cases and scenarios
  6. Data Testing
  7. Verification and Validation
  8. Reviews and Walkthroughs
  9. Test Execution

The main difference in testing a data warehouse (DW) is that the test cases revolve around queries and analytical/decision support scenarios. These cases should be written with the types of queries that representative users plan to perform, the types of scenarios that the users plan to use the DW to analyze and the various tools that make the DW "work" (ETL, reporting, querying, etc.). Don't forget to test both the initial loads of the DW and the updating, since updating is a primary activity in a DW, and don't forget to test as many different kinds of queries and scenarios that your user community can envision performing.

Adrienne Tannenbaum's Answer:

Testing a data warehouse is a complicated process which should be treated as a project unto itself. When you consider the inputs, processes, and outputs of a source-to-target data warehouse, it becomes clear exactly how much testing is required.

When we test, we take sample data all the way through the developed architecture. Of course, the test data should be representative of all possible source inputs, including "none or null," contradictory, redundant, duplicated, etc. During each step of the way (extracting from each source, merging with that of other sources, translating and processing, loading into the warehouse, queries/retrievals/reporting from the warehouse, etc, etc, etc., expected inputs and outputs should be verified. And of course, each source represents its own path throughout this architecture and must also be tested as an independent piece.

We have come across some very entertaining issues when performing our testing. For example, we found data in one source with reversed years (such as1989 instead of 1998) which had not been retrieved because of a data warehouse time scope - but was used in the actual source for its own application based reporting due to the lack of good data edits ... and I could go on and on .... So don't be surprised if you find errors in source applications and databases while you test your data warehouse - everyone is usually grateful for this!

Data warehouse testing is not something you need testing software for, as long as you think about what happens to your data from source to target - from not only one full path, but also some combined paths, and the fully integrated path.