Data Acceleration: Turning Technology Into Solutions
In September, I described a range of components that are crucial to building a high-speed data architecture, such as big data platforms, complex event processing, ingestion, in-memory databases, cache clusters and appliances. These components cannot function in isolation.
In this article, I outline four fundamental combinations of these components to create solutions that enable data movement, processing and interactivity at high speed.
Think in terms of technology stacks. These stacks share common layers’ the points at which data enters and leaves the data architecture and are designed to deliver the same outcome: the most effective exploitation of big data for the enterprise.
Stack 1: The big data platform
The building block in this stack is a big data core, with data entering a cluster of computers through a batch or streaming process, and calculations then scheduled on the basis of particular jobs. This solution can work well, but by adding complex event processing to the stack, the processing capabilities of the architecture are further enhanced and the types of insight generated are broader.
Another possibility is to combine a big data core with an in-memory database. This can substantially increase the speed at which the architecture is able to complete even the most complex calculations and processes.
Adding a query engine to the big data core, meanwhile, can give users much more immediate access to the architecture’s capabilities. It is even possible, with certain technologies, to add complex event processing results to the output that the query engine can generate.
Stack 2: The in-memory database cluster
An in-memory database cluster can receive data directly, whether from a bulk transfer or in a constant stream. All processing takes place in-memory and users can query the database directly. On its own, this solution delivers on all three data acceleration challenges, but combined with other technologies even better results are possible.
For example, combining an in-memory database and a big data platform, so that data is first ingested into the system via the latter, will enable some pre-processing of information to take place in advance.
Alternatively, the stack might consist of an in-memory database and complex event processing, with the ingestion taking place via the latter’s engine. As with the previous alternative, queries are executed in the database, speeding up response time, but some of the processing work has been done in advance.
Stack 3: The distributed cache
In the simplest form of the stack, a caching framework sits on top of the data source repository and is connected to an application that retrieves data. The cache has to be set up so as to capture the most appropriate data for each application, and all the processing is done in the latter.
Adding in components such as a big data platform builds on this simple approach. Data processing moves onto the platform, which is capable of doing the work much more effectively, and the cache then feeds the query results.
Stack 4: The appliance
This stack might work as a one-stop-shop approach, with data streaming directly into the appliance, which then completes processing, analytics and calculations before generating query results.
Alternatively, it might also include a big data platform that imports and stores the data processing is then done inside the platform and transferred to the application, speeding up the generation of insights.
More than the sum of their parts
These stacks are not a definitive list. But they demonstrate how combining complementary components can add value incrementally.
The landscape of solutions that foster data acceleration and enable a successful data supply chain has grown more complex than ever. Executives need to fully understand the technology components available on the market, because each supports data acceleration in unique ways. They also need to recognize that these components deliver maximum value only when they are combined in ways that capitalize on their complementary advantages.
Only then can they decide which configurations may be best for their organization’s needs and discuss prospective solutions with vendors and ultimately achieve returns from their analytics and big data investment.