Top Challenge With Real-time Analytics: Education
Several exhibitors and speakers at the recent Strata & Hadoop World conference in San Jose, CA, agree that among the most talked-about topics were those of data streaming and real-time analytics. That isn’t surprising, says Kostas Tzoumas, CEO at data Artisans, when you consider the knowledge gap that many data pros have on the topics.
Tzousas spoke with Information Management about what this means, and about the need for speed that many attendees have when it comes to managing their growing volumes of data. Kostas also presented at the conference on the topic of “Apache Flink: Streaming done right.”
Information Management: What are the most common themes that you heard from conference attendees?
Kostas Tzoumas: It used to be that Strata/HadoopWorld focused mainly on Hadoop, but it is clear that is no longer the case. First, it is hard to define what Hadoop means anymore. Second, it is clear that there are other major themes as visible as Hadoop, such as cloud-based Big Data as a Service applications, IoT, and data streaming.
In particular, this year’s conference was all about data streaming and real-time analytics, with every other presentation mentioning or delving into these themes.
IM: What are the most common data challenges that attendees are facing?
KT: This really depends on the company and the stage of its data management strategy, but some common themes are addressing the growing complexity of big data and analytics systems while increasing ease of use and speed.
As more and more “lego blocks” are added to the stack, developers need to be educated about these technologies and the ways they can alleviate this complexity as DevOps becomes increasingly difficult in this environment. So, there is a growing need to simplify the overall stack.
Some interesting solutions coming from large web companies include microservices and using data streaming as the backbone for moving data. There is a growing need to get results faster to power real-time and on-demand applications.
IM: What are the most surprising things that you heard from attendees?
KT: It was very surprising to see how many companies have built in-house solutions for real-time data processing due to a lack of good open source options in the past—often with mixed results. I think that this will change in the future, with more companies and vendors adopting open source solutions to address the need for real-time, continuous data stream processing.
IM: What do you view as the top data issues or challenges in 2016?
KT: One challenge is education. It is critical that data scientists and businesses alike understand the benefits that a real-time approach to the data infrastructure can bring to the business, as well as educating developers on the benefits of this new paradigm. Many companies have embraced it, but some are two steps back, and are only now considering Hadoop. Those organizations may actually need to completely sidestep Hadoop and move directly to a streaming architecture.
IM: How do these challenges relate to your company’s market strategy this year?
KT: data Artisans just raised a Series A funding round which puts it in a great position to grow the company, manage the explosive growth of the Apache Flink community, and provide a better level of support for production Flink users. We believe that data streaming is the future programming and execution model for the majority of data applications, and Flink is certainly the most advanced stream processor in the open source space.