Are there sample documents on what to test in a data warehouse environment? What are some test strategies for data warehouse testing?


Sid Adelman’s Answer: Your primary testing needs to be on the ETL process. You must be sure that all the rows you intended to load were loaded. You need to be sure the transformations were properly executed and that the cleansing was completed successfully. This means you need metrics on these processes. Include tie-outs (numbers, dollars, etc.) in verifying the ETL process and this must be done, not just in the testing phase, but each time the ETL process is run.

You also need to thoroughly test and document all the queries and reports that go into the query and report library. Include a testing module in your user training so they will at least look at their results to verify if they are reasonable.

Chuck Kelley’s Answer: I like to test three things. First, my transformations carry a $Input, $Output, $Reject, RowsInput, RowsOutput, RowsRejected and they are logged in my job_run table. If those numbers don’t match there may be something wrong with the transformation process. In the case of Rows, it may well be the case that RowsOutput + RowsRejected <> RowsInput. You can have some grouping of rows into a single output row. If you don’t have groupings, then they should match. In the case of $s, it should add up. Next, I compare what is in the data warehouse ($s and rows) to what occurred in the transformation process. Lastly, I like to compare the source system values to the data warehouse. The first two I call validation and the last one I call reconciliation. These two things should be done with every run.

Now that leaves you with a process to deal with the rejected rows. They could be rejected because of bad data or because they don’t meet the criteria to enter the data warehouse. Without knowing the application, I don’t think I can give much more of an answer.

Clay Rehm’s Answer: Please refer to my Web site at www.rehmtech.com for testing strategies.

Joe Oates’ Answer: Though I do not know of any non-proprietary data warehouse test plans that are available, gantthead.com does have some generic test plans and advice available.

The key areas that must be tested are the ETL programs, SQL queries, presentation tool queries and programs, stored procedures or queries that produce aggregate or summary tables. You should have a non-changing set of source test data as well as data warehouse test data that you or other team members can be assured of reproducible results.

The ETL testing should follow principles of any testing including boundary tests, format tests, valid contents tests, etc. It is a good idea to include invalid data for all of the just mentioned types of tests.

The SQL queries can be validated by importing data that has been loaded by the ETL process into a spreadsheet program to validate calculations and totals. Presentation tool queries and reports can be validated by hand writing SQL queries or using a spreadsheet program to validate the presentation tool output.

The key to testing the data warehouse software is to know the data that will be tested and what the answers are supposed to be.

One of the techniques that many teams find very successful is to have people for all of the areas of project get together (be sure to include knowledgeable users) and suggest specific things that can be wrong and write them on a board. Rank them as being highly damaging, moderately damaging and not very damaging. Write test scripts for the highly damaging items first, followed by the moderately damaging and not very damaging. Use this to organize how you write the test plans.

Testing cannot guarantee that there will never be any errors. There are just too many combinations and permutations that it is not practical to test each one. However, by ranking the types of errors as suggested above, you will avoid wasting time on creating test scripts and test scenarios for less important possibilities and not having time to create test scripts and test scenarios for possibilities in which errors could significantly diminish or destroy the value of the data warehouse to the users.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access