In the recent past, the architectural design of data warehouses has predominantly been ETL (extract, transform and load). However, with emerging technologies taking center stage, the architectural vision of data warehouse systems has gone through a radical shift. This shift has led to a new data warehousing process, as the previous data models were not equipped to handle the exponential increase in user data. Often, businesses were also unable to meet their critical data-related business needs due to their inability to incorporate changes into their large single data repositories. In turn, that translated into a longer waiting periods for reports that were requested by business users. Eventually, organizations ended up specifically dedicating resources to "extract data,” performing additional and often redundant analytics and reporting activities outside the data warehouse itself.
Thus, several manually intensive and expensive processes have to be established to fill in the gaps created by the limitations of the architecture design of the ETL-based data warehousing systems. The strategic solution was the implementation and transition to the data warehouse design technology ELT (extract, load and transform).
Importance of Data Warehouses in Banking and Financial Services
Banking and financial services involve numerous data intensive processes, from withdrawing money from an ATM to suggesting the most lucrative investment portfolio to a high value client. This makes data warehouse testing extremely critical in this domain. Figure 1 shows some examples of scenarios that involve data warehouse processes in specific financial services domains.
Figure 1: Data warehouse system dependability in financial services sub domains
ETL, Focusing on Desired Data Output
The ETL data warehouse technology is primarily based on the report requested by the business user. The approach is top down. First, a decision is made regarding the desired output data (i.e., the format and content of the report demanded by the user) and then the required data is extracted, transformed and loaded, as seen in Figure 2. With the focus on output, the input data needs to be processed to remove any data anomalies, ensuring high quality of data for transformation and loading.
Figure 2: The standard ETL process
When data warehouse systems were initially conceptualized, the ETL-based systems met all the business needs, as the focus was on designing the output first. Moreover, data that was relevant only for the processing needs was required. However, these systems had their limitations when needs related to future expansion arose, like the re-modification of output data, which would need a complete redesign and change to the input architecture of the data warehouse itself. This would be a very expensive and cumbersome process for an organization to achieve with the traditional ETL process.
Data Warehouse Technology ELT
The concept of ELT data warehouse technology came into existence because of the high business demand for error-free, high-quality input data in data warehouse systems. Here the approach is bottom-up. The data is initially extracted to a staging area, where the business rule and integrity checks are performed.
With ELT, the load and transformation processes are decoupled from each other, as shown in Figure 3. Hence, the development is split into smaller manageable chunks, which make project management, change management and the development of data warehouse systems much simpler.
The ELT data warehouse systems overcome the shortcomings of ETL systems, which paves the way for this transition across industries, especially in financial services and banking.
Figure 2: The ELT Process
Achieving Thorough ETL to ELT Conversion Testing
If an organization decides to transition from ETL to ELT, from a QA perspective, a lot of planning and strategy is required to ensure that the conversion is thorough and successful. In any typical re-engineering project, especially one that includes an ETL to ELT conversion, the following validation points need to be looked at (Please note that the ETL process is mentioned as the source and ELT process as the target wherever applicable):
Structure of the target table: The ELT output table structure is usually not normalized for easy accessibility to the input data. Processes and columns need to be mapped to ensure consistency between the ETL source systems and the ELT target systems.
Data type of the target table: Usually, the structural compatibility between the source and target database needs validation. Accordingly, test cases are framed to warrant data quality in the targeted database. Lastly, the output format of the data is validated to ensure no truncation has occurred to the values of data.
Intermediate phases/models: Usually, the loading or the staging area is divided into intermediate models that include:
- Input data model, which contains data that is clean and completely decoupled from end users.
- Reporting data mart, which contains data that is useful for generating reports for the end user.
- Reporting view, which contains formatted data in a predetermined template.