Think Managing Big Data Is Much Too Complex? Just Wait
The complexity of big data, and the learning curve for data professionals charged with managing it, continues to grow, much to the frustration of many organizations. At the recent Strata & Hadoop World conference in New York, Information Management spoke with Jason Schroedl, BlueData’s vice president of marketing, about the implications.
Information Management: What are the most common themes that you heard from attendees?
Jason Schroedl: Probably the most common theme is that big data today is about much more than just Hadoop. It's about Hadoop and Spark, Kafka, Flink, TensorFlow, NiFi, etc.; about NoSQL databases like Cassandra, MemSQL, MongoDB, etc. and about data science tools like R, Python, Anaconda, H2O, Zeppelin, Jupyter, etc.; and all the BI / ETL / visualization / analytics applications that were featured at the Strata + Hadoop World event.
This aligns with what we've been seeing with dozens of enterprise customers over the past year, and it came through loud and clear during our meetings with participants at the conference.
The ecosystem continues to evolve and expand at an astounding pace ... and the success or failure of a big data implementation may hinge on how well the organization handles the rapidly changing menagerie of applications and tools that their data scientists, developers, analysts, and engineers want to use.
IM: What are the most common data challenges that attendees are facing?
Schroedl: Finding the right set of big data expertise and implementing best practices for this continuously evolving and expanding ecosystem of big data applications and tools is both difficult and time consuming.
It’s even more difficult due to the rapid pace of big data innovation, as new versions and new tools are constantly being released (and of course, data science teams want to use the latest and greatest). This continues to be one of the biggest challenges for many enterprise organizations: the complexity and learning curve for big data deployments is daunting, and it’s not getting any easier.
IM: What are the most surprising things that you heard from attendees?
Schroedl: One thing that is somewhat surprising is how quickly the public cloud has moved to the forefront in discussions with attendees about their big data initiatives. Here at BlueData, we've seen this coming for a while now -- but public cloud adoption for big data workloads is happening even faster than many industry observers expected. This was evident in many of our conversations with attendees at the event this year.
As we announced this past June, BlueData is extending our support for AWS and other public cloud environments. The fact that we built our software platform on Docker containers provides inherent flexibility and portability across on-premises and public cloud environments. Our ability to support both on-prem and public cloud deployments has been very well received by our customers, and we will continue to accelerate our investment in this area throughout 2016 and into next year.
IM: What does your company view as the top data issues or challenges in 2016?
Schroedl: The inherent complexity of big data deployments remains a top challenge in 2016 -- as mentioned earlier. Ultimately, this is a major barrier to moving these projects from pilot to production and (as cited in a recent Gartner survey) it's holding back the ROI for many big data initiatives in the enterprise.
Our mission here at BlueData is to help simplify and accelerate big data implementations, from initial prototyping to enterprise-wide production deployments. Leveraging Docker containers and our own software innovations, BlueData has introduced a fundamentally new deployment model that can help to deliver faster time-to-value, lower TCO, and increased ROI for Big Data initiatives.
IM: How do these themes and challenges relate to our company’s market strategy this year?
Schroedl: Here at BlueData, we’re committed to helping our customers keep up with the rapid pace of big data innovations – enabling unparalleled application flexibility and choice to meet their organization’s unique data science use cases. Our Big-Data-as-a-Service software platform enables data scientists and analysts to spin up self-service clusters, within minutes (whether for Hadoop, Spark, Kafka, Cassandra, or other Big Data frameworks and applications).
Ultimately, we aim to provide this capability whether they deploy their Big Data environments on-premises and/or in the public cloud.