Real-time analytics has emerged as the dominant data theme for 2016, based on discussions and inquiries at this year’s Strata & Hadoop World conference in San Jose, CA.
Jack Norris, senior vice president, data and applications, at MapR Technologies, spoke with Information Management about what this strong interest means for vendors and clients, and about his surprise that so many organizations are to this point in their data analytics journey.
Information Management: What are the most common themes that you heard among conference attendees and how do those themes align with what you expected?
Jack Norris: This was the largest Strata Hadoop conference, with thousands in attendance from a broad range of industries, applications, and business functions. That being said, questions about real-time capabilities were very common. The questions ranged from real-time streaming analytics to NoSQL databases for performing real-time operations, to using Kafka for a high-volume stream processing. The questions basically ran the gamut from how to process incoming data in real time to the analytics that drive business decisions.
As organizations adopt big data, they tend to go through a common journey, which is to first use the platform to collect data into a central location. Typically, they’re offloading this data from more expensive legacy platforms; whether that’s a data warehouse, a mainframe, or enterprise storage, and then they move on to pursue a use case.
The use cases they start with usually have a fairly short development process and result in a fast payback. Organizations go through this process to gain experience and generate a quick return. They can continue to expand the number of these types of applications, and then move on to addressing the more mission-critical, real-time applications.
The driver here is to move away from just simply reporting and analyzing to integrating analytics into operational flows so that they can respond quickly to incoming data and actually impact business as its happening.
The surprise is not that people are concerned about real time; but rather that so many people are at that point. It’s a sign of how fast this market is accelerating, and how critical these real-time features are now that organizations look to pursue their data strategies.
IM: What are the most common data challenges that attendees are facing?
JN: Traditionally, we’ve taken an “application first” approach: you start with the application and determine the data requirements. You then prepare the data into specialized schemas to serve the application. Each of these applications has their own dedicated silo, and the result is that you have a proliferation of silos, and the issues around managing the data movement, security, and protection continue to be some of the top challenges faced by organizations.
Converging these data silos into a single platform addresses the major challenges and concerns that organizations have. However, not all platforms are created equal. If you converge on a data platform that’s batch-only or is limited in terms of higher-end functions that you see in enterprise storage, then that’s a temporary data storage layer at best. So the next set of challenges are around how you ensure that the long-term persistence store has the relevant data protection, disaster recovery, and high availability capabilities so that it can deploy a platform that supports mission-critical applications.
The next set of challenges address the theme that we heard at Strata this year -- supporting real-time capabilities. Increasingly there are a set of applications and use cases where performing the analytics in real time allow you to make decisions within that window of time that you have available to impact your business as it happens. For example, in online environments, it’s about deciding how to address a customer’s experience while the web page is loading.
Increasingly, it’s not just about real time in terms of the speed of the analytics; it’s real time in terms of the whole cycle, from when data is collected to when business actions are taken. The challenge is how to coordinate that flow of information – how you coordinate the required data streams. This is the final frontier for a converged data platform.
IM: What are the most surprising things that you are hearing from attendees?
JN: I think that the surprising thing is how front and center real time is, and how many attendees felt they were already behind. In fact, this confirms a recent study that Accenture conducted with executives in respect to big data. Their top two concerns were 1) Our competitors will gain market share at our expense and 2) We will not be able to recover and catch up if we delay.
The other surprising thing is the reaction to the Strata sessions on streams. Application developers and architects today are interested in making their applications as real-time as possible. To make an application respond to events as they happen, developers need a reliable way to move data as it is generated across different systems, one event at a time. In other words, these applications need an event-based streaming system.
There were sessions focused on streaming systems and streaming analytic solutions such as Apache Spark and Storm and what emerged was the advantages provided by a converged data platform that has a big data-scale streaming system built into it, which could support global event replication reliably at IoT scale.
Q: What do you view as the top data issues or challenges in 2016?
JN: In 2016, it is all about convergence: how do you handle more data, how do you do it faster, and how do you do it with less resources? Companies need to start with the data. The promise of big data is to gather information into a centralized hub or data lake and bring processing to it. When it’s time to get serious, there are two key areas to focus upon.
The first key challenge is achieving data convergence. A converged data platform eliminates separate clusters and enables applications to benefit from all data. In a converged data platform, all data is treated like first class citizens – structured, unstructured, data in motion and data at rest. The platform enables diverse applications, including batch and continuous, all in the same platform. A converged data platform eliminates separate clusters and enables applications to benefit from analytics in real time.
The second key challenge is to manage event-based data flows, quickly analyze flowing data, and understand the context. Examples of data flows include web events, machine sensors, and biometric data. Context is derived from understanding long-term trends and patterns as well as incorporating newly arriving data.
IM: How do these themes and challenges relate to our company’s market strategy this year?
JN: The market is unfolding exactly how we anticipated. We are really well positioned, due to the investments and decisions we made seven years ago, and the continual investments we’ve made since. From day one, our vision has been to build the best data platform in the world. We initially introduced platform innovations for Hadoop, greatly improving its reliability, performance, and ease of use. We invested heavily in our underlying architecture so that we could deliver advanced capabilities that build on the power of our existing platform, while at the same time contributing to and supporting the open source ecosystem as well as industry standards.
We’ve been steadily executing on this vision ever since, converging the core platform services and engines required for modern data-driven applications. Our rich history of data convergence began with our initial launch of the MapR Platform that combined Hadoop, the best of open source innovation, and advanced enterprise storage capabilities. In 2013, we followed up by converging NoSQL data capabilities into the platform. We followed this by pioneering and contributing to Apache Drill.
At the end of last year, we introduced MapR Streams, a global publish-subscribe event-streaming system for big data. MapR Streams connects data producers and consumers worldwide in real time, with unlimited scale, and is a key component of the MapR Converged Data Platform, providing core advantages for real-time applications. The MapR Converged Data Platform addresses the top data management issues organizations face and enables them to gain a competitive advantage and drive their business.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access