As I write this column, Informatica has just announced PowerCenterRT, claiming it to be the first enterprise data integration software platform for combining real-time data with information stored in a data warehouse. This announcement has already created a reaction from Ascential Software and Acta, both of whom argue they already offer real-time data integration platforms. There is no question that real-time data integration is a hot topic. This column will explore the real-time processing of data, focusing specifically on the near real-time data warehouse/data store.

Until recently, business intelligence (BI) applications and their underlying data warehouses have been used primarily as strategic decision-making tools that are kept separate from the operational applications that manage day-to-day business operations. There is now, however, significant industry momentum toward using BI for driving tactical day-to-day business decisions and operations. Many organizations can no longer run their businesses effectively without business intelligence analytics; and a BI system is now essential to the health and success of the business.

The trend toward using BI to drive business operations is resulting in it becoming more integrated into operational processing. This integration can be achieved in several ways. One approach is to build a closed-loop decision-making system where actionable analytics produced by BI applications are used to generate recommended actions (product pricing changes, marketing campaign modifications, etc.) to address specific business issues. In the e-business environment, many companies are looking toward extending this closed-loop process to the automatic adjustment of business operations, based on decisions generated by the BI system. In fact, some companies would like this automated closed-loop processing to occur in close to real time.

CIOs often blanch at the mere mention of the term "real time." This is because many of the vendors and analysts jumping on the real-time bandwagon don't fully understand the business requirements behind real-time processing and often leave CIOs with the impression that real time is about performance, rather than about improving the speed of business decision making. Real-time BI enables business users to react rapidly to changing business conditions. It helps reduce the latency between a business event and the time it takes for the business to react to the event. Ideally, this latency would be zero (i.e., the business would be able to react in real time).

From a technology perspective, a business intelligence system for making real-time decisions and taking automated actions consists of three components (see Figure 1):

  1. A data integration engine that captures and transforms operational data, and loads it into a near real-time data store (i.e., a data warehouse whose data currency is close to that of operational systems).
  2. An analysis engine that can provide rapid user access to current actionable analytics from any place and at any time.
  3. A rules-driven decision engine that uses actionable analytics to make business recommendations or to generate and deliver action messages/transactions for processing by operational applications.

Figure 1: Real-Time Decision-Making System

These components may be used together or independently. It is unlikely, however, that any given application would require all three components. I will now focus on the data integration engine ­ subsequent columns will discuss analysis and decision engines.

Data warehouses are typically maintained by batch jobs that take periodic nonvolatile snapshots of operational data, and clean, transform and load the data into a warehouse database. To support real-time data integration, the present batch snapshot approach to extracting operational data must be replaced by processes that continuously monitor source systems and capture and transform data changes as they occur, and then load those changes into a data warehouse in as close to real time as possible.

A data integration engine can capture data changes from operational systems by monitoring either data events or applications events in the source systems. Data event monitoring involves familiar database technologies such as database triggers, data replication and database recovery log processing. To support application event monitoring, data warehousing vendors have begun building interfaces to messaging and enterprise application integration (EAI) software and adding support for Web services. Examples of products and their support include Ascential Software DataStage XE (IBM WebSphere MQ), Acta ActaWorks (Java JMS), Informatica PowerCenterRT (IBM WebSphere MQ, TIBCO Rendezvous), IBM DB2 Warehouse Manager (IBM WebSphere MQ) and the PeopleSoft Integration Broker (Informatica inbound, and IBM WebSphere, JMS and Web services for outbound messages). These vendors no longer talk about their ETL platforms, but instead market their data integration platforms. Key distinguishing factors when selecting products will be meta data integration, parallel processing, workflow facilities, bidirectional processing (inbound and outbound messages), support for industry standards, scalability, change data capture and the provision of an interface development kit.

Recent announcements from data warehouse vendors show there is significant interest in real-time data integration. This technology promises significant business benefits, but potential users should be aware that it is very much in its infancy and poses some interesting challenges in areas such as data quality ­ caveat emptor.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access