In the history of IT, few things have had a greater long-term impact than the relational database. Today, we are witnessing a major shift in the nature of data and how data is used. But this time, it's not about stored data; it's about real-time information.
The volume of real-time data is increasing exponentially, available from sources as diverse as stock ticker feeds to telecom networks to communication networks. The nature and volume of this new class of information requires new processing capabilities. Compounding this increase in volume is the fact that real-time data is of greatest value at the instant it arrives, with its value decreasing rapidly every second thereafter.
Until recently, there was no off-the-shelf systems software available to capture the full value of streaming real-time data. Yet applications that can process and analyze tens or hundreds of thousands of messages per second with very low latency are already critical for many industries where there is a need to make split-second decisions based on large volumes of fast-moving and highly complex data the instant it becomes available - including financial services, telecommunications, homeland security and the military.
Examples of streaming applications include hedge funds that process multiple stock ticker feeds to leverage real-time information and gain a millisecond advantage in finding and executing on arbitrage opportunities; telecommunications companies that monitor networks to instantly determine fraudulent activity or to ensure that they are receiving the revenues they are entitled to; and military/intelligence agencies that monitor the Internet and other communication networks for patterns that may indicate potential terrorist activities before they happen. These types of applications demand an infrastructure that can handle real-time streaming data.
What Do Stream-Processing Applications Require?
Because processing streams of data in real-time requires an entirely new paradigm that flies in the face of traditional data management and business intelligence systems, the important issues and questions around this technology are still emerging. Organizations have been struggling to understand stream processing and what solutions can most effectively meet their specific needs.
As IT executives and business decision makers wrestle with the rising tide of real-time information, there are eight fundamental attributes which characterize this new class of systems software that can help shed light.
Keep the Data Moving. To process information in real time, latency must be eliminated. To achieve low latency, a system must be able to perform message processing without having to first store and retrieve the data.
Figure 1. Stream Processing Engine Processes Data Continuously, with Optional Storage
Provide a Familiar Framework for Querying the Information. In the world of data management, SQL has emerged as the means to access information's value. However, store and query systems that use standard SQL will not work with real-time information. The solution is to execute SQL operations on streaming data. In streaming applications, some querying mechanism must be used to issue arbitrary queries on moving data and compute real-time analytics. StreamSQL, which extends SQL to account for the vagaries of streaming data, offers a good approach.
Accept the Reality of Data Imperfection but Provide the Ability to Manage it. In real-time systems, data is typically not stored and is being generated on an ongoing basis. As a result, the infrastructure must make provisions for handling data which is late or delayed, missing, or out of sequence - without impacting the real-time nature of an application by waiting for data or timing out a process.
Generate Predictable Outcomes. In a real-time processing infrastructure, the system must process time-series records in a consistent manner to ensure the calculations performed on one time-series record do not interfere with the calculations performed on another.
Integrate Stored and Streaming Data. Stored data is important, has value and can be used in conjunction with real-time information. The preservation of information is almost universally desired, whether it is yesterday's business analytics or control strategies to apply in a specific trading situation. In many situations, events of interest may depend in part on real-time data as well as on historical information.
Assure Data Safety and Availability. V irtually all enterprise applications are expected to stay up all the time; if a failure occurs, regardless of the cause (i.e., hardware, operating system, software, application), the application needs to continue operating. This is especially true in real-time environments where the value of the information may only exist for fractions of a second.
Scale Automatically. The volume of real-time information and the ways that it can be used continue to expand. As a result, the application for processing this data must be able to scale quickly and automatically. These applications must also be free of artificial limitations created by system architecture. To do this successfully, it must be possible to split an application over multiple processors or machines without user intervention.
Shield Complexity. The benefits of stream processing are evident, but many of the underlying concepts and much of the technology is complex. For business users and IT professionals, stream processing should be available at a high level of abstraction, without requiring low-level programming or deep specialized knowledge of the physical systems or underlying infrastructure.
Streaming Data Options to Date
With these concepts in mind, here is an overview of the various approaches available or being used to process streaming data in real time. As you will see, until recently, there have been few options available to organizations that rely on streaming data.
Traditional Relational Database Systems: These offer general data management functionality and are designed to work best with structured static data for applications such as online transaction processing or data warehousing. Given the rapid increase in rates and volume of streaming data, and the requirement that it be processed in real-time, existing relational databases and the applications that run on them are limited in what they can deliver. The indexing, storage and ad hoc query execution of traditional database systems all introduce latencies that are unacceptable for streaming data applications. As a variation of this approach, new in-memory databases share a similar structure and attributes with traditional relational databases but exist and operate in RAM, offering dramatically faster processing rates than their larger cousins.
Custom Code: Developing proprietary applications has been the most common way to overcome the limitations of traditional databases to support low-latency, high-volume streaming-data applications. The issue with this approach has been that roll-your-own applications are expensive and require highly skilled technical resources and long development cycles in the range of months to man-years. Moreover, these applications typically require significant investments in time and financial resources to modify when business needs change.
Rules Engines: These allow the creation of applications that separate business rules from the underlying code. This enables users to modify the rules as needed. For example, users can alter conditions for an algorithmic trading application without having to rewrite the code. Rules engines have proven to be a useful option, but issues arise as the number and complexity of the rules increase. When the size of the rule set grows, this approach can quickly become unmanageable.
A New Approach
Stream processing software is the most recent addition to this field. These software applications are fast, off-the-shelf solutions that can process high-volume streaming data in real time. These can be broken down into two distinct categories, depending on what the organization needs:
Point products: Several technology solutions have emerged to address specific business problems, particularly in financial services, where streaming data applications for algorithmic trading, compliance monitoring and risk management are in high demand. These solutions work well, provided that the application's underlying data model, programming and user interfaces, and analytic capabilities meet the needs of the user exactly and do not require frequent modification.
The main drawback of point products is that they often cannot scale as application requirements grow, and are not able to meet the real-time processing needs of multiple applications across the organization.
Platforms: Stream-processing platforms represent a new class of systems software that enables the rapid creation of applications capable of processing and analyzing large volumes of data within milliseconds of its arrivals, all with very low latency. These platforms can be used to create applications similar to the point products mentioned earlier but offer greater flexibility and can be used to meet an enterprise's growing real-time data processing requirements.
In many respects, stream processing platforms share similarities with the traditional database systems mentioned earlier; the difference lies in their architecture and the class of information they are designed to process.
What the Future Holds
The volume and diversity of real-time information is going to continue to grow. As microsensor technology enters mainstream use in commercial and government/military markets, the volume and velocity of data will also continue to grow. Once the cost of microsensor technology gets down to pennies per sensor, every item of material significance could be sensor-tagged and be able to report its state and location in real time. Today the Lojack service does this for cars so they can be tracked in the event of theft; but imagine Lojack for everything else - items that cost as little as a few dollars, such as consumer products, library books and wristbands. This growth in sensor tagging will create a flood of real-time messages that will have to be processed in order to provide value.
When a sensor is triggered, it generates a message that typically requires action to be taken instantly. In these types of applications, real-time demands go up. Data volumes go up. Complexity goes up. It's a very different challenge than the stored data challenge solved by relational databases.
Faster, more effective approaches to stream processing are breaking new ground, enabling organizations to realize powerful new opportunities through the use of these applications to deliver real-time business benefits. As businesspeople begin to evaluate the role of streaming data for their organizations, they should be aware of the full scope of this emerging market.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access