Organizations take data warehousing to the cloud

Published
  • April 19 2017, 6:42am EDT

The pace at which organizations are moving their analytics efforts to the cloud is increasing, and many organizations are adopting a combined on-premises and multi-cloud environment in which to host their data.

That is the view of Nikita Shamgunov, chief technology officer at MemSQL. Information Management spoke with Shamgunov at the Strata event in March in San Jose, asking his thoughts on current trends in analytics and data management.

“We see an accelerated move toward the cloud,” Shamgunov says. “When talking to our customers or attending conferences, we see most enterprises using hybrid environments with infrastructure deployed on premises and in several public clouds. We believe that multi-cloud environments is the future and enterprises will choose to depend on infrastructure that works across clouds.”

“Specifically in data warehousing, we see a shift from expensive on-premises offerings to much more efficient cloud data warehouses,” Shamgunov continues. “Traditionally, moving data has been a hard proposition due to its stickiness, however the cloud proposition is irresistible. Having a solution that works natively in the cloud and with an option of deploy on-premises has become a unique, compelling position for MemSQL.”

“In addition we see that real-time has finally become top of mind of enterprise CIO and CDOs (chief data officers),” Shamgunov says.

While the Strata event is always a huge one, drawing thousands of attendees, there were several common themes that emerged from this year’s show, Shamgunov notes.

“Perhaps the biggest theme is the strategic shift of the event, as the name was changed from Strata+Hadoop World, to Strata Data Conference,” Shamgunov says. “The Hadoop ship has sailed and the world is now looking for a broader set of data solutions.”

Beyond that, Shamgunov identified several other themes:

● “Streaming and real-time have become reality. People have been talking about real-time for some time, and now they are actually putting real-time applications into production.”

● “Artificial intelligence/machine learning is an ongoing trend.”

● “The Hadoop ecosystem is notoriously complex, but modern infrastructure, particularly in the cloud, is simple and scalable. People are looking more and more for simple push button solutions.”

As to the most common challenges that attendees are facing, Shamgunov cites the following:

● “Batch to real-time. In this world people use Kafka as the backbone of real-time message delivery, but the rest of the traditional infrastructure, especially on the data warehouse side, does not support real-time operations.”

● “Moving to the cloud. There are a variety of excellent options in the cloud, but they are not the same as on-premises, so lift-and-shift of Hadoop infrastructure to the cloud is hard.”

● “Enterprise data warehousing on Hadoop is still challenging.”

● “Grappling to making Spark operational. There is plenty of excitement around Spark and people certainly use it for machine learning, ETL, and data science. However, putting Spark in production as part of a real-time pipeline has required highly skilled practitioners.”

Finally, Shamgunov said perhaps the most surprising trend was the “rapid shift past Hadoop.”

“The activity has moved to cloud and real-time: everyone wants to figure out how they can expose and operationalize real-time data,” Shamgonov says. “Technologies like Spark and Kafka get a tremendous amount of attention in this regard. They are both real-time distributed systems which is the focus of modern deployments.”

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access