According to Dr. Richard Hackathorn, creator of the Time-Value Curve, “the value of data is directly proportionate to how fast a business can react to it. In other words, a corporation loses money every time it delays getting information into the hands of decision-makers.”
Real-time BI is crucial to survive in this competitive world. It is important to understand the new challenges that must be addressed and develop a solution that will handle the requirements and technology hurdles at hand.
Real-Time Business Intelligence
The major goal of real-time BI is reducing the time taken for corrective action or initiative. Real-time BI is designed to control data latency, analysis latency and action latency. Companies must understand that ROI will also depend heavily on the ability of an organization to modify its business practices to take advantage of improved responsiveness in the IT system.
A real-time BI system has two main components: real-time data integration and real-time decision-making. The objective of the real-time data integration component is to capture business events from operational systems and integrate them into a low-latency store. This component supports real-time, data-on-demand processing. The real-time decision-making component, on the other hand, supports real-time performance management and real-time predictive analysis. Figure 1 gives an overview of real-time BI architecture.
Challenges of Real-Time BI
BI applications include the activities of decision support, query and reporting, online analytical processing, statistical analysis, forecasting and data mining. Each of these components needs to be designed to operate in a real-time environment, and there can be many challenges in designing such system. Some major challenges include:
Designing real-time ETL. Traditional ETL tools are batch oriented, wherein the data becomes available as some sort of extract file on a certain schedule, usually nightly, weekly or monthly. Then the system transforms and cleanses the data and loads it into the data warehouse. ETL tools tend to update systems with complete files, not compact amounts of change data. However, for real-time ETL, a continuous flow will be required throughout the day with minimum latency. Real-time operation requires the synchronization of data across multiple layers of an organization and many different sources. Connecting the large and diverse array of data sources to a real-time warehouse is highly complex.
Data modeling for real time. From a data architecture perspective, real-time data warehousing challenges the posture of the data warehouse as a system of periodic measurements, advocating the requirement for a system of more comprehensive and continuous temporal information, i.e., a real-time database model that deals with the temporal nature of data.
Search, OLAP, and query and reporting. Today's query and OLAP tools, not having been designed with real-time warehousing in mind, can produce unanticipated results.
Scalability. To support real-time processing, the system must have a scalable and flexible back-end database environment for loading and administering large amounts of data. The database must also be able to handle mixed workloads, since the tasks used to update low-latency stores will need to run in parallel with real-time decision-making applications. Real-time processing will involve real-time alert reporting through emails or messages. These alerts need to be designed to operate on real-time data feeds.
Suggested Solutions
An ideal real-time BI tool will be the one which can answer all the above challenges. Experts all around the world are studiously working to develop such a system and have come up with many approaches to design a real-time BI. Some approaches are briefed here.
Micro batch ETL. A data warehouse can only be considered real-time, or near real-time, when all or part of the data is updated, loaded or refreshed on an intra-day basis, without interrupting user access to the system. Convention ETL, file based approach is extremely effective in addressing daily, weekly and monthly batch reporting requirements. Micro batch ETL designed on log based, real-time change data capture technology can provide a nonintrusive means for real-time data acquisition from an operational data source. Figures 2 and 3 give a pictorial presentation of the system.
Log-based change data capture technology captures data changes in the source system as they happen and flows them immediately to the target system,, ensuring business information is always reliable and timely. Most database management systems manage a transaction log that records changes made to the database contents and to metadata. By scanning and interpreting the contents of the database transaction log, one can capture the changes made to the database in a nonintrusive manner. The principal task of the CDC process is to scan the log and write column data and transaction-related information to the CDC change tables. It detects when tables are newly enabled for CDC and automatically includes them in the set of tables that are actively monitored for change entries in the log. Similarly, disabling CDC will also be detected, causing the source table to be removed from the set of tables actively monitored for change data. When processing for a section of the log is finished, the capture process signals the server log truncation logic, which uses this information to identify log entries eligible for truncation.









