Increasingly we are hearing about real-time analytics. This article defines real-time analytics, the impact it will have upon extract, transform and load (ETL) tools and the move toward real-time decisioning.
Real- time Analytics
Ovum defines real-time analytics as: The delivery, analysis and action of the right information by the right person at the right time to support a particular business situation.
Real-time analytics moves the decision-making process to an integrated part of the operational business process. It combines access to transactional and decision support data at the right point in the decision-making cycle to enable faster and more informed actions. This lends itself to specific business processes (such as preventing an account lapsing or dynamically pricing products) where it is necessary to provide delivery, analysis and action in real time.
Figure 1: Three-Step Process of Real Time Analytics
Real-time analytics involves a three-step process, comprising:
- Real-time delivery of information
- Real-time analysis of information
- Real-time action on information
These elements are illustrated graphically in Figure 1.
The Four Confusing Scenarios of Real-Time Analytics
What is often deemed as real- time analysis can be broken into four distinct types of analysis as shown in Figure 2. Only one of these types fits our definition of real-time analytics.
Time-Dependent Analysis: This is standard business intelligence (BI) reporting. Time-dependent analysis involves monitoring standard measures that are reported at time-critical intervals. This type of regular business reporting usually happens on a weekly or monthly basis and focuses on standard and well-understood business measures. However, competitive pressures are forcing organizations to report and act on this information in shorter time frames. While this type of analysis is performed in real time, is not real time based upon our earlier definition. However, there is a definite trend for organizations to increase the frequency of updates to the data warehouse or analytic application in line with the requirement for time- critical analysis.
Threshold Analysis: This type of analysis is based on the ability to define certain levels of performance measures or thresholds and report on variances from them. The detection of any variance triggers an alert to interested parties, typically via e-mail. This type of analysis, although delivered in real time, requires further analysis to act on the information.
Event-Driven Analysis: This analysis fits the definition of real-time analytics more closely. The key difference here is that the push for analytics comes from the operational system via a business event, rather than a bottom-up push from the analytic platform or data warehouse. Business events such as a key account customer lapsing, inventory reaching a certain level or an update on product delivery status trigger the resulting analysis. Ultimately, the analysis is used to determine the next course of action within a business process. However, while this information flow and analysis is triggered by an operational event, the reporting mechanism often remains separate and distinct from the transactional system triggering the event.
Closed-Loop Analysis: Closed-loop analysis provides the final piece in the jigsaw puzzle for real-time analytics. It takes elements of notification and event-driven analysis a step further by analyzing the effect of a key business event, recommending a course of action and, more importantly, feeding this back into the source operational system. This can either be in the form of an alert or a prompt for the user. Ultimately, to maximize the potential of this analysis and allow the organization to respond to new opportunities and changing demands, it is necessary to perform this analysis in real time.
Figure 2: The Four Confusing Scenarios of Real-Time Analytics
ETL and Real-Time Analytics
A view of the key technology components within a real-time analytics solution does not differ that much from a traditional BI solution. The framework does, however, require a different approach to the traditional information architecture. While it combines the components of a typical BI solution data warehousing, BI tools and analytic applications it adds other infrastructure components, such as enterprise application integration (EAI) solutions. Real-time analytics requires a more heavyweight and robust information infrastructure that can support the pervasive use of information across a business.
Figure 3: The Core Technology Components of a Real-Time Analytics Solution
The key component underpinning the real-time analytics framework is an EAI solution that provides the technology for integrating applications and exchanging business information in a format that each layer within the real- time analytic framework understands.
The Presentation Layer
The presentation layer is responsible for handling the presentation of information. It provides the mechanism for content aggregation and display. In most cases, this involves delivery to a Web browser or enterprise portal, but specific scenarios can be accommodated through a wireless device such as a personal digital assistant (PDA) or cell phone. It is responsible for granting access to the right information to the right person and must, therefore, have personalization capabilities. The information must be actionable; that is, it must not only alert a user of a situation, but also provide actionable information to help progress or resolve the situation.
The Analytic Layer
The analytic layer is responsible for analyzing the data. It comprises the traditional BI tools (query and reporting, OLAP and data mining) and/or analytic applications. Data mining is the key decision-making technology component that will provide the greatest value for real-time analytics solutions. It allows companies to make sense of increasing amounts of data by analyzing and segmenting, predictive techniques that help support the decision- maker. It automates part of the decision-making process and provides the necessary intelligence or actionable information to enable quicker and more informed decisions.
The Data Integration Layer
The data integration layer is responsible for the data storage and integration component of the real-time analytics infrastructure. It consists of a data warehouse, intermediary data store (otherwise known as an operational data store or ODS) and an ETL tool to populate the data stores. This layer acts as a centralized data hub, but offers no analytic capability. It combines both operational and historical information, and provides a mechanism for cleansing, transforming and storing the data in the most appropriate format. One of the biggest technological requirements is the ability to manage large amounts of data and to extract, transform and load it efficiently in order to provide a holistic view of the business.
Real-time analytics requires a holistic view of an organization in order to support the decision-making process in real time. It combines integration of transactional and decision-support data, which brings new challenges to implementing a solution. The ETL tool vendors are in a prime position to provide some of the data integration capabilities required of a real-time analytic solution. However, many of the ETL tools are not designed for the real- time enterprise.
The tool vendors most likely to provide solutions to address the potential of real-time analytics will adopt clear and well-executed strategies for implementing and providing data integration solutions, such as:
- Focusing on improving the capabilities of the data and application integration. Solutions should offer a flexible technology and an adaptable process for managing the consistent demand for new and changed data flowing from source to target systems.
- Investing in data integration project-management expertise and delivery methodologies, either through partnerships with implementation specialists (such as system integrators) or by building best-practice industry-focused skills.
- Improving the capability for managing and delivering meta data to end users, whether for internal or external purposes. Demand for good meta data support has so far been fairly limited with respect to ETL tools. The value of providing integration at both the semantic and data level will become increasingly apparent in the real-time analytics market. Tools that do not provide full meta data functionality will find it difficult to find an audience.
- Delivering data quality solutions. The increased importance placed on the value of data has, in turn, led to a corresponding requirement to address the problems of data quality. Data quality is set to become a key success factor in real-time analytics.
- Considering the potential opportunities presented by Web services. Although an immature technology, the area is generating a considerable amount of hype and interest in equal measures.
Challenges and Opportunities
To some extent, the recent popularity of the term "real time" springs from the lack of success that the ETL tools met with "trickle feed." There are a number of reasons for this not least the decidedly uncatchy term "trickle feed."
One of the differentiators of a BI solution over a reporting solution is in the types of questions that are answered. A BI query is one where multiple rows need to be aggregated in some form to provide an answer to the question. This definition does not rule out the times when only a single row may be returned, nor the need to be able to drill down; but it does differentiate BI queries from the needs of a reporting query where typically a single row is retrieved. As we know, providing good "real-time" reporting is rather difficult. As such many organizations have looked to their BI solutions to solve this problem, and this has pushed the need for real-time feeds of data. They are not needed for the business intelligence functions as is very clearly stated by W. H. (Bill) Inmon in his book, Building the Data Warehouse.
Another difficulty faced with real-time feeds comes from the database management technology. When performing a BI query, searching over lots and lots of rows, the last thing one wants is for someone to update the table. This means the query has to restart to ensure that the answer is correct. Couple the operational systems to the data warehouse with a real-time feed, and no BI query would ever get to be answered. Once again, BI queries would be relegated to the middle of the night, leaving only short reporting queries allowed during the core operation hours. This takes us back full circle to one of the reasons we started building data warehouses in the first place to enable BI queries.
However, we have seen the rise of the operational data store as a solution for the reporting challenge, and a real-time feed to the ODS makes a great deal of sense. Here is where the ETL tools that offer the ability to take information from a message queue or other EAI pipe can provide a great deal of value.
The single biggest benefit of the convergence of ETL with EAI as real time becomes realized is that we can start to perform closed-loop analysis. This provides the decision-maker with both the information needed to make a decision along with the ability to act upon the decision and make an impact on the operations of the company.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access