While data security remains a top concern for nearly every organization, a growing number of firm are taking data analytics to the cloud, according to discussions at the recent Strata & Hadoop World conference in San Jose, CA.
Ali Hodroj, vice president of products and strategy at GigaSpaces, discussed the implications of this trend with Information Management.
Information Manaagement: What are the most common themes that you heard from conference attendees?
Ali Hodroj: The most themes we heard about revolved around streaming, real-time analytics, and cloud-based analytics solutions.
In terms of streaming, Apache Spark continues its accelerated adoption in the enterprise, graduating beyond batch-oriented or ETL type of workloads to tackle machine learning and high-level data science.
We also saw significant traction by Apache Kafka, expanding its footprint to from messaging to actual stream processing use cases. They align quite well with our expectations: we saw memory-centric computing (such as Spark, Flink, DataTorrent) as hot technologies during Strata NY 2015.
Now, with the emergence of high-rate data generation applications (mostly in IoT), it’s only natural that the adoption of real-time and streaming analytics continues to grow.
IM: What are the most common challenges that attendees are facing with regard to data management and data analytics?
AH: A lot of attendees are looking into real-time analytics solutions but struggle with the operational complexity of building a real-time streaming data pipeline. For instance, extracting insights to improve customer experience in real-time for a retailer requires them to combine several technologies (Hadoop, Storm, Kakfa) all together.
This complexity of deployment, management, and design slows down innovation towards actionable analytics significantly. While some have solved it by moving their workloads to the cloud, those with on-premise deployments still have a lot of challenges to solve.
IM: What are the most surprising things that you are hearing from attendees?
AH: We saw many companies moving their analytics to the cloud. That was a surprise to us because most of our customers have high security and regulatory imperatives (e.g. Financial services, healthcare, and retail) which are still a grey area to many in the cloud. However, the success of Spark in the cloud through companies like Databricks alleviated many of those concerns.
IM: What does your company view as the top issues or challenges with regard to data management and data analytics in 2016?
AH: I think the issue of self-services analytics and data democratization have been a key theme across many of our customers. Besides the streaming analytics initiatives, a lot of customers are looking at providing a self-service analytics solution across the enterprise, but struggle to with governance, analytics toolkit standardization, and most of all performance. Enterprise users want to be empowered with toolkits the answer questions against live data.
On the one hand, Spark does provide a common abstraction API that has a large community behind it along with visualization toolkits (Zeppelin or Tableau). On the other hand, moving the underlying data from a data warehouse mentality to a modern customized live data marts that operate at scale and multi-tenancy underneath Spark is still a significant challenge.
We’re focused on helping customers address these issues by providing a converged realtime data platform that combines the power and popularity of Spark API’s along with high-performance / low-latency storage operating on top of an in-memory data grid. The upstream tiers provide a standard analytics and visualization toolkits for business analysts and data scientists, while the underlying storage leverages a hub-and-spoke in-memory data fabric with central data lake as well as federated live data marts created ad-hoc from it.
IM: How do these themes and challenges relate to our company's market strategy this year?
AH: We’ve been in the real-time and low latency extreme transaction processing world for more than a decade. So, the shift from big data to fast data definitely converges with our core product competencies.
The themes of real-time and streaming, coupled with the challenges of operational complexity and predictable streaming performance, are the main drivers behind one of our product offerings this year: InsightEdge. We’ve taken our experience from financial services, telecommunication, and transportation in terms of high performance computing patterns and applies those to big data frameworks like Apache Spark.
Our goal is to provide users looking for a simple fast data solution that meets the demand of low latency streaming (in IoT, financial services, retail) while speaking the common language of Apache Spark. But fast analytics without immediate action are useless: users want to innovate through converging analytics, transactions, and streaming altogether to capture insights and gain competitive edge in real-time. This is our main value proposition from GigaSpaces, connecting analytics to impact in real-time while minimizing complexity.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access