I'm looking for some input on building and maintaining the data warehouse development and testing environment. This environment will be used for development, unit testing, some integration testing, maybe some stress testing. So far, our warehouse has been small enough to get away with using a copy of the production database for testing, but as we've been growing, we've been running up against space problems. What I'm looking for, in particular, is:
Les Barbusinskis Answer: Generally speaking, your development/testing environment should be a statistically significant subset of your production environment plus any new/modified entities that will be in the next release of your data warehouse. Achieving this ideal is never easy. Unit testing is usually performed against test tables created within the developers own schema and tablespace. This much is simple. The hard part is creating a representative environment for integration and/or stress testing. There are tools on the market that can generate test data, but (at least in my experience) theyre rarely able to synthesize complex inter-table, inter-row or intra-row data relationships and/or business rules. Hence, testing with synthesized data can mask a lot of logic errors. The best method Ive seen is to do full extracts of small tables (< 100,000 rows) and sample extracts of the larger tables (e.g., extracting every 5th or 10th row). These extracts would then be loaded into the test database in RI (referential integrity) sequence, with FK constraint errors being discarded. Once initialized, the test database is copied to a baseline backup. Developers coordinate between themselves when using the test database (updates are allowed). Periodically, the test database is restored from the baseline for regression testing or for general clean up. In addition, the baseline version is periodically restored, and the test database is altered to add/modify and populate tables for the next release of the data warehouse. This is then backed up to a new baseline. The test database can occasionally be augmented with additional data from production to perform limited stress testing (i.e., focused on a subset of tables).
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access