Organizations Confirm Big Time Need for Real-time Data
Thousands of attendees at last week’s Strata & Hadoop World conference in San Jose, CA had a lot on their minds when it comes to data management and data analytics, but one theme that emerged repeatedly was that of real-time processing.
Several leading vendors that Information Management spoke with at the show confirmed the trend, including Gary Orenstein, chief marketing officer at MemSQL. We asked Orenstein for his take on what is behind this growing interest in real-time data management.
Information Management: What are the common themes you heard among show attendees and how do they align with what you expected?
Gary Orenstein: The biggest theme across the show was a drive towards real-time processing. The stark realization of batch processing limitations coupled with a need to drive real-time results pushes companies to explore new data processing options.
We see this with the rise of messaging queues, such as Apache Kafka, and the migration away from batch-oriented MapReduce to faster processing with Apache Spark. Companies such as MemSQL and MapR both share visions of achieving real-time processing with their platforms.
This shift aligns with expectations. It is no secret that our appetite for instant information continues unabated across consumers and businesses. We’re seeing how data rises to the occasion to meet demands across industries.
IM: What are the common challenges that organizations complain of?
GO: In the midst of all the technology innovation, data scientists and architects still suffer similar challenges of the past.
Slow data loading is perhaps one of the most prevalent. With increasing volumes, it simply becomes harder to ingest and store new data. Legacy data architectures based on disk drives or single server systems simply cannot meet today’s performance needs.
Slow queries also pose a challenge for similar reasons that disk-oriented and single node systems quickly reach a limit on performance.
The lack of concurrency, otherwise known as multi-tasking, further inhibits what older systems can handle. Ideally you have fewer data systems that can handle more types of workloads and models to simplify infrastructure and reduce costs.
IM: What are the most surprising things you heard?
GO: Most surprising is that Hadoop is no longer Hadoop. This theme emerged a year or two ago but it was more clear this year as the core elements of Hadoop are chipped away. The Strata+Hadoop World opening keynote included time explaining how Spark is replacing MapReduce. And the emergence of a variety of datastores beyond HDFS signals that the market has additional needs. Specifically, customers are looking to handle fast ingest and retrieval of structured and semi-structured data which can be more appropriate for a database, than a file system like HDFS.
IM: How do these themes and challenges relate to your company’s market strategy?
GO: Real-time needs are working their way through nearly every industry. Often under the umbrella of digital transformation, we see needs across the on-demand economy and the Internet of Things.
Companies are pursuing the most important step first, which is harnessing real-time data, often through streaming data pipelines and constructing real-time dashboards. This helps place a finger on the pulse of the business.
The most advanced customers are going further by applying this data to predictive analytics. Using existing machine learning models, including those that operated on batch data, customers are placing predictive modeling into real-time workflows for the most prescient view of their operations.
IM: What do you see as the top issues or challenges regarding data management and analytics today?
GO: The winning strategies for data management and analytics are to keep it simple by striving for fewer systems that can handle multiple workloads. Databases that can handle streaming, transactions, and analytics within a single system are a good example of this.
Staying true to trusted approaches also helps companies overcome data-related struggles. For example, retaining SQL as a data programming language, and doing so with a native relational engine, allows for immediate access for a large pool of SQL developers, a near infinite number of enterprise tools that connect via SQL, and an ongoing industry effort to improve this near universal approach.
Today, customers have a choice of data solutions - it is clear that they are selecting the ones that deliver peak performance, online scalability, and compatibility with the enterprise data ecosystem. That is what we expected coming into 2016 and Strata only reinforced it.