Despite all the attention focused on data management, many firms find it difficult to access data from different parts of the organization in a quick and efficient manner. Information Management spoke with Josh Klahr, vice president of product management at AtScale, at the recent Strata & Hadoop World conference, about the implications.
Information Management: What are the most common themes you hear among participants?
Josh Klahr: As expected, there was a lot of talk about Hadoop and Spark. Our research (http://www.atscale.com/press/industrys-largest-hadoop-maturity-survey-reveals-adoption-rapidly-accelerating-and-best-practices-for-achieving-value-emerging/) shows Hadoop dominates even as interest in Spark grows.
Spark is generally being used for a broad set of use cases, including pipelined data processing and parallel-izable data science workloads, while Hadoop continues to shine for high volume data collection, storage, and batch processing. BI workloads for both Hadoop and Spark are also top of mind.
There seems to be growing buzz over ways machine learning can be applied in the realm of data science. We are paying close attention to the ways that machine learning/data science workloads and use cases will overlap with the world of BI in the upcoming year.
IM: What are the most common challenges attendees are facing?
Klahr: It's clear that many in our space are still finding it difficult to access date from different areas of the business in a quick and efficient manner. Many are working with dirty data and limited support from IT due to lack of resources. They are searching for ways to make the data access process self-service and fast.
IM: What are the most surprising things you've learned from attendee?
Klahr: The most surprising thing is that adoption of BI on Hadoop is much higher than predicted by analyst firms -- such as Gartner -- in the past. Every enterprise has some level of Hadoop. Also, the people who are highly incentivized in making Hadoop perform -- like Cloudera and Databricks -- have made a ton of progress in making their engines faster and you'll see that in our benchmark, which is coming out soon. The people who attended our booth this year were much more sophisticated users, much farther along in their use of Hadoop than in years past.
The momentum of BI tools in the enterprise has never been stronger. There are more participants in the BI layer. Business users will have more choices and there will be far more BI tools available. Microsoft, Google and Amazon are subscription based so more people are likely to use them.
IM: How do these themes relate to your company's market strategy this year?
Klahr: These themes align with what our CEO saw six years ago when he was at Yahoo!. Hadoop is no doubt THE data platform for the enterprise. Also, the BI market is not consolidating. We don't want to be in that market. We want to enable Tableau, Google, Amazon, etc. to succeed. We embrace open source and the progress open source provides. That's why we make them faster instead of competing with them.
IM: What does your company view as the top data issues or challenges this year?
Klahr: I believe the focus now will be on real-time data ingestion, processing and analysis. I also believe there will be a lot of movement toward big data in the cloud.