What is the difference between data warehouse testing and conventional testing? In data warehouse testing what type of testing should we use (sanity, load, UAT)?

Evan Levy's Answer: Actually, traditional application testing doesn't look at both acceptance testing of an application and the data content. A data warehouse project must actually ensure that acceptance testing is executing against the data (loading, value accuracy, transformation, completeness, etc.) as well as the application functionality (reporting accuracy, display completeness, response time, etc.).

Joe Oates' Answer: Testing for data warehouse is quite different from testing the development of OLTP systems. The main areas of testing for OLTP include testing user input for valid data type, edge values, etc.

Testing for data warehouse, on the other hand, cannot and should not duplicate all of the error checks done in the source system. Even though there are some data quality improvements, such as making sure postal codes are associated with the correct city and state that are practical to do, data warehouse implementations must pretty much take in what the OLTP system has produced.

Testing for data warehouse falls into three general categories. These are testing for ETL, testing that reports and other artifacts in the data warehouse provide correct answers and lastly that the performance of all the data warehouse components is acceptable

Here are some main areas of testing that should be done for the ETL process:

  • Making sure that all the records in the source system that should be brought into the data warehouse actually are extracted into the data warehouse: no more, no less.
  • Making sure that all of the components of the ETL process complete successfully
  • All of the extracted source data is correctly transformed into dimension tables and fact tables
  • All of the extracted and transformed data is successfully loaded into the data warehouse

Once the extracted and transformed data is loaded into the data warehouse, testing should be done to ensure that all scripts, reports and aggregations produce correct results from the detailed data in the data warehouse.
Once the above has been validated by testing, the next step is to concentrate on performance testing. If the ETL process takes longer than the allowable time window, then it must be tuned to meet the time window allowed.

If reports take an unacceptably long time to run, use of aggregation or other tuning techniques will be necessary to bring them to an acceptable level of performance.

In testing for data warehouse, it is critical to make sure that things are done properly and get the right answer first. Only then should you try to optimize performance. Trying to do both at the same time is usually not successful.

The explanations here are by no means exhaustive but should provide a good overview of the differences between testing in a data warehouse environment versus a conventional OLTP environment.