An effective QA includes testing of all three stages, as shown in Figure 4. The data is also compared across stages to ensure its accuracy and consistency.
Figure 4: Sample intermediate stages in an ELT system
Relationships in output data models: As stated in the previous sections of this article, the main advantage of ELT processing is the easy maintenance of individual stages due to the clear segregation of each stage from the other. Let us consider a scenario where a customer’s loan information is stored in a source table and a record consists of the customer’s first name, last name, address, document name, type, loan type and the loan amount. The target ELT system would require tables on the customer, loan and the document information. After the transition from the ETL to ELT system, the following data relationships would need to be validated to ensure that they are still intact after the transition:
- Customer table to loan table.
- Loan to document table.
- Document to customer table.
- Static table testing of relationship codes.
Backup database: The master and duplicate databases would need to be validated for consistency in terms of database structure, framework and data between them.
Process validation: This entailstesting the data loading and scheduling of jobs by checking the error logs and output of the scripts used for loading. The analysis of error logs is done to report any inconsistencies (e.g., an actual error occurrence not reported in the error log or vice versa). The ELT jobs are then triggered in sequence to test the ELT data warehouse system behavior.
End-to-end validation of the ELT system: All the stages (i.e., the source, input data model, reporting model, reporting view and report flow) are to be tested using predefined input data to ensure that it moves through all stages in the expected way, generating the desired output. This end-to-end validation of the ELT system is essential for detecting inherent defects due to the interdependency of code/data (if any) between all the stages.
ELT isn’t a solution to all the problems of data warehouse systems. In some specific cases, a hybrid solution of both ETL and ELT methodologies would be the appropriate model to adopt, depending on business requirements. To illustrate this, consider a situation where a data warehouse has multiple source systems. The data from these sources can be brought to a common staging area through the ETL process and then specific ELT processes can be further applied to generate user-specific reports.
However, having an ETL, ELT or ETLT as a data warehousing solution is not enough. A clear understanding of the vision of the data warehouse output from an end user’s perspective is essential for a successful data warehouse testing validation. It’s also important to comprehend the business context and clearly define the input data, their interrelationships and the associated processes in the data warehouse system to generate correct and relevant reports for the business/end user.
Sundaresa Subramanian has 11 years of experience in software testing, digital signal processing research and development. He is a Senior Project Manager associated with Financial Services testing practice at Infosys Limited. He has a diversified experience in project management, client interfacing and R&D and has strategized and program managed several functional and data warehousing QA engagements. û ûHe has authored and presented papers on various topics at national and international conferences on the topics of risk identification, data warehouse test strategizing, mobile QA, specialized QA services and project management. He can be reached at ssubramanian_gv@infosys.com.













Very interesting article. I would like to offer another perspective. I've encountered many organizations that have pulled the transformations (ETL) out of their ETL layer or tool and pushed the transformations into the warehouse (ELT) because the ETL layer could not perform or scale to the needs of the business (e.g., meet their SLA for instance).
Many tools/vendors even "promote" this with so called push down "optimization" capabilities to generate the SQL that creates the ELT or ETLT. What's needed is a high performance, scalable tool that can intelligently and automatically scale to the requirements and data volume required.
I also think this is why many organizations are also looking at Hadoop. Very common use cases for Hadoop is ETL. There are many articles from both organizations and the Hadoop vendors talking about ETL uses in Hadoop. The issue here is not necessarily one of scale, but the maturity (or really the immaturity) of both Hadoop and MapReduce skill sets.
Thanks again for a very interesting article.
--Keith Kohl