Cloudera just announced Cloudera Distribution for Hadoop Version 3 at its developer summit. It brings new enterprise-class capabilities for Hadoop and incorporates 11 additional open source projects into this release. Those projects include Oozie for workflow, Pig for dataflow, Hive for SQL query and table support, Flume for streaming data, Sqoop for data integration, Zookeeper for coordination services, and Hue, a user interface framework that provides the Cloudera Desktop. Hadoop uses MapReduce as a parallel data process framework. You can use Hadoop easily through downloading a VMWare image or spinning it up on the Amazon Elastic Compute Cloud (EC2) system that works with Amazon Web Services. Cloudera is led by an industry database veteran, Mike Olson, who has provided his perspective on the advances in version 3.
The Hadoop project has brought other open source software providers to market, such as Pentaho, which is supporting it through data integration and as part of its business intelligence (BI) platform and tools. A new BI provider called GOTO Metrics has announced a platform that utilizes Hadoop to manage collections of information that can be used to provide metrics and other information critical for decision support. These two vendors are not yet official partners OF Cloudera but are supporting Hadoop through its interfaces to the data store.
To investigate Hadoop, you can download the beta release of version 3 from Cloudera without any hassle. I liked the review of this new release by Cutting, who personalizes both it and the underlying technology. To consider whether Hadoop is right for you, Cloudera offers a learning site in its developer center. Keep in mind that while Hadoop is free to download, you will have to invest time and resources into any enterprise production deployment. In that case it makes sense to purchase a support license to have access to experts who can help you in a pinch. All in all, Hadoop is an alternative approach that revolts against what is called “Big Data” and “No SQL.” I think it will challenge not only the giant database vendors IBM, Oracle and Teradata but also the newer ones like Netezza. Cloudera will try to ride the Hadoop wave, and it will be interesting to see how far the company can advance into the market with new customer deployments and expansion of its ecosystem.
Mark also blogs at VentanaResearch.com/blog.