Moving beyond data lakes to the data marketplace
Many organizations have adopted data lakes as a preferred way to store and access data from all its various sources. But a growing number of organizations have also found that strategy to be more difficult than first thought. The result is that data lakes for many firms more closely resemble swamps.
That was one of the messages to come out of the recent Strata & Hadoop World conference in San Jose, Calif.
“Data lakes are no longer the goal,” says John Felahi, vice president of product management for Podium Data. “Corporate IT and big data teams increasingly see lakes as ‘swampy’ first generation Hadoop sandboxes that are both slow to deploy and inadequately clean, governed, secured or managed to deploy to the business.”
The new focus, Felahi says, “is on implementing a next generation big data ready enterprise data management platform – a data marketplace.”
CIOs and CDOs recognize that their existing combination of data warehouses and analytic appliance can’t meet the demand of the business for data, Felahi explains. In simple terms, they are too slow, ridged and costly.
“Incremental change, or throwing more money at existing approaches, is not going to solve the problem,” Felahi says. “They need to lead their organization towards implementing the next generation architecture for enterprise data management, data delivery and analytics. Hadoop might be part of that in the near term, but the real change will be a new approach to pre-staging data in secure governed enterprise scale environment which allows IT to meet the business demand for data and gives business users self-service on demand access to data across the organization – i.e. business agility.”
So where do the problems lie?
“There are a lot of people at Strata & Hadoop World who have months or years sunk into first generation data lake projects,” Felahi says. “Many of those people stopping by the Podium Data booth, are coming to terms with the fact that their efforts to date to build a data lake using open source code and programming skills, are both too slow and lack essential enterprise capabilities such as governance, security, metadata, etc. As a results those first generation data lakes can’t be deployed to the business users any time soon.”
“A second thread of conversations has highlighted the lack of a compelling business case in many instances to push the data lake project forward and create urgency around moving the project from a sandbox to an enterprise scale platform,” Felahi says. “’What are the common use cases for data lakes?’ and ‘How do I attach my data lake project to some specific business need/user group?’ are two of the most frequency questions cropping up in conversations on site at the show.”
None of this comes as a big surprise to Podium Data’s leaders on site at the show, Felahi notes.
“Paul Barth and his team have been building enterprise scale data management solutions for decades and are deeply aware of the complexity of loading data – especially complex legacy or mainframe data sources – into Hadoop and making that available to business user through an enterprise grade BI or analytics platform,” Felahi explains.
“In some ways, the conversations at Strata & Hadoop World this spring demonstrate clearly that the Hadoop market is finely evolving from being a relatively specialized and over-hyped emerging technology space to being a potentially valuable new platform for next generation mainstream data management. It is possible, ironically, that even as the show gets smaller – with less expo hall space and fewer vendors – the commercial opportunity of the product is just coming into its own.”
Felahi says he was surprised at a few trends that emerged from the Strata show.
“There are still a lot of people at Strata & Hadoop World whose interest in the technology is focused on relatively narrow data analytics or data science projects. Joining that now is the focus on Hadoop as a platform for AI and machine learning,” Felahi says.
“It is surprising that more people are not interested in Hadoop as the platform for next generation enterprise data management – as the successor for the EDW or analytic appliance,” Felahi says. “Admittedly the solutions to make this a viable approach for a large company are still emerging – Podium is a leader in this space. However, given the economic advantages of Hadoop and its ability to massively accelerate and expand delivery of data to business users, one would think there would be more vendors pursuing this opportunity and more focus on this topic.”