for Information Management Blogs
JUL 1, 2010 7:33am ET

Blogroll

Hadoop Gets Easier with Cloudera Version 3

Print
Reprints
Email

Managing large volumes of enterprise data continues to challenge IT organizations as they deal with administration and storage of no longer just terabytes but now petabytes of data and costs increase accordingly. This massive size of data complicates the underlying issues of where and how to store it easily in low-cost hardware and manage the data efficiently. One attempt at a solution is Hadoop, an open source community-based project. It began as part of Yahoo and was led by Doug Cutting, who used the MapReduce concepts for large-scale distributed computing to create a distributed file system. Yahoo itself runs the largest deployment of Hadoop. Doug Cutting is not new to the open source world, being involved in the creation of Lucene, open source search technology among many other open source community projects.
 
The Hadoop Distributed File System (HDFS) enables scaling of data segments across servers. Through such replication across servers it achieves built-in fail-over capability that does not require a redundant array of independent disks (RAID). This technology is recognized across the industry, and many large software companies, including IBM, have announced intentions to support it. Now there is a need for a commercial, licensed version of Hadoop as organizations want more than just a file system, requiring an entire management system to ensure it can operate like other databases in production. This is where Cloudera comes in. In 2008 Doug Cutting saw an opportunity to build a software company around Hadoop that provides licensed and supported versions and also services and training. The company acquired venture financing and now has reference customers that include Bank of America and Samsung.

Cloudera just announced Cloudera Distribution for Hadoop Version 3 at its developer summit. It brings new enterprise-class capabilities for Hadoop and incorporates 11 additional open source projects into this release. Those projects include Oozie for workflow, Pig for dataflow, Hive for SQL query and table support, Flume for streaming data, Sqoop for data integration, Zookeeper for coordination services, and Hue, a user interface framework that provides the Cloudera Desktop. Hadoop uses MapReduce as a parallel data process framework. You can use Hadoop easily through downloading a VMWare image or spinning it up on the Amazon Elastic Compute Cloud (EC2) system that works with Amazon Web Services. Cloudera is led by an industry database veteran, Mike Olson, who has provided his perspective on the advances in version 3.

The Hadoop project has brought other open source software providers to market, such as Pentaho, which is supporting it through data integration and as part of its business intelligence (BI) platform and tools. A new BI provider called GOTO Metrics has announced a platform that utilizes Hadoop to manage collections of information that can be used to provide metrics and other information critical for decision support. These two vendors are not yet official partners OF Cloudera but are supporting Hadoop through its interfaces to the data store.

To investigate Hadoop, you can download the beta release of version 3 from Cloudera without any hassle. I liked the review of this new release by Cutting, who personalizes both it and the underlying technology. To consider whether Hadoop is right for you, Cloudera offers a learning site in its developer center. Keep in mind that while Hadoop is free to download, you will have to invest time and resources into any enterprise production deployment. In that case it makes sense to purchase a support license to have access to experts who can help you in a pinch. All in all, Hadoop is an alternative approach that revolts against what is called “Big Data” and “No SQL.” I think it will challenge not only the giant database vendors IBM, Oracle and Teradata but also the newer ones like Netezza. Cloudera will try to ride the Hadoop wave, and it will be interesting to see how far the company can advance into the market with new customer deployments and expansion of its ecosystem.

Mark also blogs at VentanaResearch.com/blog.

Advertisement

Comments (0)

Be the first to comment on this post using the section below.

Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.

Blog Archive for Mark A. Smith

Defining Discovery’s Big Data Value Prospects
With Cloud Computing, Business Takes the Lead
Kapow’s Big Data Value Proposition
Big Data: Check Under the Hood
Tremors in the Big Data Landscape

More from Mark A. Smith »

Blog Index »

Where do young IT professionals (30 and under) obtain information to aid with daily role responsibilities and career development?

Trade publication websites 14%
Social media 23%
Vendor websites 4%
Vendor/community forums 7%
Newsletters 1%
Trade conferences/meetups 2%
RSS feeds 6%
Web search 44%

 

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.