3 reasons why the future of data lakes is in the cloud

Register now

The future growth of data lakes won’t be in Hadoop; it will be in the cloud. Hadoop data lakes won’t cease to exist, but moving forward we will see the majority of growth in cloud-based data lakes.

There are exceptions to this rule. Companies in the defense industry, for instance, that require the utmost privacy and that can afford to run their own data centers, might keep their information on-premises. For everyone else, there’s the cloud.

Conceptually speaking, data lakes date back to the 1970s and ‘80s when organizations began storing transactions in relational database systems (RDBMSs). As organizations wove more systems into their daily operations, they needed a way to consolidate the reporting from those systems. That requirement led to the creation of data warehouses, which eventually evolved into their modern, scalable iterations, borne out of a desire to analyze big data.

Platforms like Hadoop allowed companies to build scale-out databases at a reasonable price. The arrival of Hadoop meant big data analytics was a practice made suddenly accessible to smaller companies. Hadoop today is the predominant platform underlying data lakes.

Data lake technology has come a long way over the past 40 years, but today it behooves leaders to ask about tomorrow and where data lakes will be in the future.

Time to look to the future

Last year, there was an uptick in companies that reported realizing competitive advantage by leveraging data analytics, bucking a recent trend. As more organizations incorporate data analytics into their everyday strategy, they will realize that to draw the conclusions they’d like to draw, to develop the data strategy they’d like to develop, they will want to use more and more data sources.

These data sources are becoming increasingly cloud-based. Social media is a popular data source. Mobile user data, location-based data, IoT data, industrial manufacturing data — it all lives in the cloud already. Because the data originates in the cloud, there is no reason to bring it on premises, and managing data centers is better left to cloud vendors, anyway.

The blending of newer data sources with more traditional information repositories — CRMs or supply chain management systems that generate data over the course of business — enables effective analytics. Legacy information is critical because it sets the context for analytics.

Three Keys Support This Vision of the Future

There are other reasons data lakes of the future will be cloud-based.

First, the cloud offers what is essentially a pay-as-you-go model combined with infinite scalability on demand. If you’d like to build your data lake using technology X, but it doesn’t work, you can easily shift to technology Y. Changing from one data lake technology to another on the cloud doesn’t require the massive capital investment it would have in the past. There is no need to purchase new servers or equipment as you change your plans or grow. The cloud is accommodating, no matter how often or how drastically things change.

Second, there are a multitude of technologies available that enable the type of analytics organizations would like to perform on their data. Whether the workload is streaming, a more traditional warehouse-type of workload or a Hadoop-type workload, all of the technology to support analytics on those workloads is readily available in the cloud.

The third reason is data security. Data security used to be one of the objections to moving to the cloud. However, with strong authentication and top-notch data encryption capabilities readily available as cloud services, data security has become a reason to move to the cloud. Cloud vendors' primary business model now requires securing customers' data, and it would be prudent for leaders to take advantage of cloud vendors’ expertise.

When Planning for the Future, Think About the Cloud

Data holds the potential for competitive advantage. The ability to quickly and effectively analyze data will be the key to unlocking that potential. But analysis on the growing volume and variety of data requires flexibility and scalability.

The answer? The cloud, where an entire services ecosystem exists for the explicit purpose of allowing scalable data analysis in a flexible environment.

Think about how to improve your organization or data operations. Think about the data you will need to realize those improvements. Think about which cloud services running analytics best fit your use case. Then think about having that technology available at the swipe of a credit card, because that is the vision of a data lake future in the cloud.

For reprint and licensing requests for this article, click here.