EMC: Can Data Lakes Create Big Data Splash?
With one announcement, EMC Corp. hopes to solidify its big data strategy while also convincing data scientists and investors that three closely aligned companies (EMC, Pivotal and VMware) are better than a big corporate breakup.
Central to the effort is EMC's new Federation Business Data Lake. The offering includes storage and analytics technologies from EMC, Pivotal and VMware. The big question: Can EMC make the Data Lake a hit with on-premises customers, while also selling the solution to cloud services providers (CSPs) that need to both compete and perhaps cooperate with VMware's own hybrid cloud services?
A successful Data Lake requires three capabilities, according to EMC:
- Store: It must store structured and unstructured data for all types of analytics, from many different sources, blending capacity and performance as needed for the analytics use case.
- Analyze: It must provide modern data management and analytics tools for all types of analytics including Hadoop-based, In-Memory No-SQL and Scale-out MPP.
- Surface & Act: It must provide data to users and applications to enable real-time changes in outcomes and to influence critical decisions.
Pieces in the Puzzle
Deploying traditional big data solutions has been a complex, time-consuming task, EMC asserts. Not by coincidence, EMC claims the Federation Business Data Lake Solution addresses those challenges. The solution includes:
- An analytics layer virtualized with VMware running on Vblocks with predefined analytics use cases and automated provisioning and configuration.
- EMC Isilon provides the Data Lake Storage Foundation.
- The analytics layer includes the Pivotal Big Data Suite, PivotalHD and the SQL-on-Hadoop engine (HAWQ). The suite integrates with SAS, Tableau and other analytics platforms.
- EMC also supports Hadoop distributions from Cloudera and Hortonworks (rival MapR was not mentioned in the announcement).
To prep customers, EMC is launching onboarding services, workshops and education services for the Data Lake offerings -- which are set to debut in April.
Integrated Bundles Emerge
Still, a growing list of companies now offer turn-key big data solutions that include pre-integrated hardware and software. Avnet Enabled Hadoop -- a turn-key system featuring IBM's software -- emerged earlier this month. And Cisco Systems has partnered with the leading Hadoop providers to deliver complete big data solutions for data centers.
At the same time, numerous cloud providers have introduced Hadoop as a Service -- which may allow customers to more quickly (and cost-effectively) deploy and activate big data services. Some Wall Street watchers see Hadoop as a Service and cloud storage as key threats to EMC's empire.
Some investors have been calling on EMC to break itself up -- potentially spinning off the company's ownership stakes in RSA, Pivotal and VMware to unlock shareholder value and create more nimble, smaller companies. But EMC CEO Joe Tucci has held strong to his belief that a united EMC (including RSA, Pivotal and VMware) can better serve cloud and on-premises customers.