Technology Components of a Scalable Architecture
Mike would like to thank Brian Swarbrick of Palladium Group for his contribution to this column.
In order to successfully achieve information empowerment, todays business intelligence (BI) applications must be scalable. Applications must continue to function efficiently as new constraints are imposed, including increased data volumes and changing user requirements. To completely address scalability, both infrastructure (technology) and architectural (design of the end-to-end solution) perspectives must be considered.
A key driver for selecting the appropriate technical infrastructure is the defined service level agreement (SLA). SLAs typically dictate timing of information release, application performance expectations and availability expectations (i.e., system response in the event of a software or hardware issue). Scalable BI technology architectures must support the demand for increasing data volume and user scale while maintaining the required service levels.
Technology components for the BI infrastructure primarily consist of hardware (servers), software (integration middleware, reporting/analytics software, data management software) and storage applications. For the overall infrastructure to scale, each of these components must be able to scale individually. By implementing the right scalable technology, an organization can minimize its initial investment and incrementally invest by growing that technology as demands increase over time.
Scalability is typically achieved either through the vertical or the horizontal scaling of hardware. Vertical scalability is achieved by adding computer resources (memory and processors) to the system to handle increased workloads. Horizontal scalability is achieved by adding more servers to balance the existing workload and to support increasing workloads. Horizontal scalability provides for load balancing and failover, which is the ability to keep the system running in case of a hardware failure. Horizontal and/or vertical scaling can be applied to each hardware and software component of the BI infrastructure provided that software is designed to support those methods of hardware scaling.
Traditional extract, transform and load (ETL) development often included proprietary hand coding technologies or leveraged database code functionality. Implementing enterprise solutions today requires robustly engineered ETL tools to handle increasing data volumes and smaller load windows. According to Forrester Research, Most ETL vendors have focused on scalability as a top requirement.1 Leading tools support scalable functionality by processing data in a multithreaded manner, performing parallel processing against large data sets and by distributing processing across multiple machines.
The data management platform must be designed to support efficient loading and retrieval of data. Large BI applications must provide data management strategies that dictate how data is managed over time. Database applications must be designed to support frequent data administration activities within tight load windows. Database vendors offer data partitioning and parallel querying to support scalability. Data partitioning segregates data to support parallel query execution and efficient data administration. ETL processes take advantage of the database partitioning features through efficient load process design. Parallel query provides support for faster data retrieval when used in conjunction with well-designed partition strategies. Robust database solutions support both horizontal and vertical hardware and software scalability configurations that provide high- availability solutions.
Reporting applications must scale to support increasing numbers of users and more complex queries over time. Fast response time for data access is important to keep users engaged. Vendors in this space achieve scalability by distributing the application workloads across multiple servers to balance usage. Applications should be able to distribute workloads seamlessly across multiple servers and must allow additional servers into the configuration (horizontal scaling) or an increase of resources on existing servers (vertical scaling).
BI applications require flexible and scalable storage solutions. New BI requirements necessitate that more data be stored. Mergers, acquisitions and the implementation of new systems increase the data storage requirements for existing BI applications. New government regulations and requirements for longer data retention require that data be stored for longer periods.
Implementing a storage solution, such as storage area network and network attached storage technologies, allows organizations to support current data requirements and scale over time by adding storage. Such storage solutions allow an organization to simplify the administration and costs associated with multiple storage systems into a single storage solution that is easier to support and scale. Todays storage solutions provide applications for the complete management of data storage with features such as data redundancy to support failover and high-availability processing, central administration and support for disaster recovery efforts.
The hardware, ETL, data management, reporting and storage components of a BI architecture must provide scalability either through the ability to scale horizontally and/or vertically or through specific functionalities built into each of those components. For an overall solution to scale, every one of these components must be able to scale independently.
- Connie Moore, Philip Russom and Colin Teubner. How to Evaluate Enterprise ETL, Forrester Research, December 17, 2004.