JAN 18, 2013 9:24am ET

Related Links

BI Reference Architecture that Actually Works
6 Trends Guiding Financial Customer Data

Web Seminars

IBM & Teradata Compared: A Total Cost of Ownership Study
May 22, 2013
What Is Data Science? You Might Be Surprised!
June 3, 2013
AARP: Embracing Dynamic, Agile Analytics Platforms for Big Data
June 5, 2013
Feature

ETL to ELT Conversion Testing in Data Warehouse Engagements

Print
Reprints
Email

An effective QA includes testing of all three stages, as shown in Figure 4. The data is also compared across stages to ensure its accuracy and consistency.

Figure 4: Sample intermediate stages in an ELT system

Relationships in output data models: As stated in the previous sections of this article, the main advantage of ELT processing is the easy maintenance of individual stages due to the clear segregation of each stage from the other. Let us consider a scenario where a customer’s loan information is stored in a source table and a record consists of the customer’s first name, last name, address, document name, type, loan type and the loan amount. The target ELT system would require tables on the customer, loan and the document information. After the transition from the ETL to ELT system, the following data relationships would need to be validated to ensure that they are still intact after the transition:

  • Customer table to loan table.
  • Loan to document table.
  • Document to customer table.
  • Static table testing of relationship codes.

Backup database: The master and duplicate databases would need to be validated for consistency in terms of database structure, framework and data between them.

Process validation: This entailstesting the data loading and scheduling of jobs by checking the error logs and output of the scripts used for loading. The analysis of error logs is done to report any inconsistencies (e.g., an actual error occurrence not reported in the error log or vice versa). The ELT jobs are then triggered in sequence to test the ELT data warehouse system behavior.

End-to-end validation of the ELT system: All the stages (i.e., the source, input data model, reporting model, reporting view and report flow) are to be tested using predefined input data to ensure that it moves through all stages in the expected way, generating the desired output. This end-to-end validation of the ELT system is essential for detecting inherent defects due to the interdependency of code/data (if any) between all the stages.

ELT isn’t a solution to all the problems of data warehouse systems. In some specific cases, a hybrid solution of both ETL and ELT methodologies would be the appropriate model to adopt, depending on business requirements. To illustrate this,  consider a situation where a data warehouse has multiple source systems. The data from these sources can be brought to a common staging area through the ETL process and then specific ELT processes can be further applied to generate user-specific reports.

However, having an ETL, ELT or ETLT as a data warehousing solution is not enough. A clear understanding of the vision of the data warehouse output from an end user’s perspective is essential for a successful data warehouse testing validation. It’s also important to comprehend the business context and clearly define the input data, their interrelationships and the associated processes in the data warehouse system to generate correct and relevant reports for the business/end user.

Sundaresa Subramanian has 11 years of experience in software testing, digital signal processing research and development. He is a Senior Project Manager associated with Financial Services testing practice at Infosys Limited. He has a diversified experience in project management, client interfacing and R&D and has strategized and program managed several functional and data warehousing QA engagements. û ûHe has authored and presented papers on various topics at national and international conferences on the topics of risk identification, data warehouse test strategizing, mobile QA, specialized QA services and project management. He can be reached at ssubramanian_gv@infosys.com.

Advertisement

Comments (1)
Hi Sundaresa-

Very interesting article. I would like to offer another perspective. I've encountered many organizations that have pulled the transformations (ETL) out of their ETL layer or tool and pushed the transformations into the warehouse (ELT) because the ETL layer could not perform or scale to the needs of the business (e.g., meet their SLA for instance).

Many tools/vendors even "promote" this with so called push down "optimization" capabilities to generate the SQL that creates the ELT or ETLT. What's needed is a high performance, scalable tool that can intelligently and automatically scale to the requirements and data volume required.

I also think this is why many organizations are also looking at Hadoop. Very common use cases for Hadoop is ETL. There are many articles from both organizations and the Hadoop vendors talking about ETL uses in Hadoop. The issue here is not necessarily one of scale, but the maturity (or really the immaturity) of both Hadoop and MapReduce skill sets.

Thanks again for a very interesting article.

--Keith Kohl

Posted by Keith K | Saturday, January 19 2013 at 12:03PM ET
Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.

Where do young IT professionals (30 and under) obtain information to aid with daily role responsibilities and career development?

Trade publication websites 14%
Social media 23%
Vendor websites 4%
Vendor/community forums 7%
Newsletters 1%
Trade conferences/meetups 2%
RSS feeds 6%
Web search 44%

 

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.