Streaming Data, Data Lakes Dominate Data Discussions
A growing number of data experts are confirming that 2016 is the year of action when it comes to data analytics. As a result, data professionals have moved beyond general discussions of what data analytics can do for their organizations, and want real use cases that they can learn from and copy.
Information Management spoke with Ravi Dharnikota, head of enterprise architecture at SnapLogic, about his observations at the recent Strata & Hadoop World conference in San Jose, CA. Dharnikota said attendees he spoke with were most interested in streaming data, data lakes, and in Apache Spark as an analytics platform of choice.
Information Management: What are the most common themes that you heard among conference attendees and how do those themes align with what you expected?
Ravi Dharnikota: Compared to the 2015 event, this year shifted a bit away from academic discussions of the latest Apache project and towards real use cases. This year I heard quite a bit about:
Streaming -- Streaming data ingestion, processing and analytics.
Data lake -- How to do the lake right; ingestion; governance; data prep.
Spark -- A huge shift towards support for technologies to run on Spark as a platform.
IM: What are the most common data challenges that attendees are facing?
RD: One of the most common challenges with data management is simply its pervasiveness. It’s everywhere in the organization. They need some way of bringing it all together in one place, making data searchable and consumable by everyone, with “guardrails” in place.
The other challenge is that the big data ecosystem is both constantly changing and can be quite noisy with overlapping messages from vendors and open source die-hards. Organizations that just want to get stuff done to drive business practices need help from end-end frameworks.
What are the most surprising things that you heard from attendees?
RD: None of these are truly surprising, but worth noting:
Customers are realizing that no matter how open and flexible the vision of a data lake is, there has to be some governance with proper access controls, auditing and data sensitivity considerations. Also data needs to be easily searchable for anyone looking for data in the lake.
The data lake is not just Hadoop. It could be in the cloud from Amazon Microsoft or Google.
A lot of organizations have both Hortonworks and Cloudera in their data hub cluster.
IM: What does your company view as the top data issues or challenges in 2016?
RD: Organizations outside the heavy tech industry need guidance and help in democratizing data.
There is a lack of an industry-defined “best practice” for doing data management well in the modern big data context.
Lack of big data skill sets will continue to require self-service platforms and tools that abstract the technology and make it easy to use.
While the continuous innovation and change in the big data industry provides fast, frequent improvements to the technology, it is tough to keep up with in an organization where there are competing priorities and projects.
IM: How do these themes and challenges relate to your company’s market strategy this year?
RD: SnapLogic’s big data strategy is focused on making it easy to keep up with changes in the big data ecosystem for the organizations that are not able to pour resources into creating and tinkering with their system of moving, managing and consuming data.
Our strategy revolves around looking at the Data Lake as a whole and what an enterprise needs to achieve their Data Management initiatives. This could include looking at things like security, streaming, storage formats, governance, metadata etc.