p19h11t6ipljr1oh7d7qnd48ob6.jpg
Hadoop as a Service: 18 Cloud Options
Hadoop is a cornerstone technology for many big data projects and applications. But many organizations lack the time, expertise and budget to build and optimize their own Hadoop clusters. Not by coincidence, the Hadoop-as-a-service market (i.e., Hadoop in the cloud) will grow nearly 85 percent annually from 2014 to 2019, according to Research and Markets. Here’s a sampling of options cloud-based Hadoop option.Image: Pixabay
p19h11t6ipajtuaq1j2s67a1q647.png
1. Aleron
An emerging IT service provider in Australia, Aleron focuses on secure data management. Launched in 2010, Aleron also promotes a range of big data services – including Hadoop-focused offerings. More information: Aleron big data.
p19h11t6ip1qbv15881g0r5kq1o68.png
2. Altiscale
Altiscale has developed a purpose-built, petabyte-scale infrastructure that delivers Apache Hadoop as a cloud service. An operational support team monitors jobs and system tuning for customers. Instead of charging by the node, Altiscale charges by monthly usage. More information: Altiscale Hadoop as a service.
p19h11t6iq1bmc1pcl1mk91ojk12359.jpg
3. Amazon Elastic MapReduce (Amazon EMR)
Amazon EMR provides a managed Hadoop framework to distribute and process vast amounts data across dynamically scalable Amazon EC2 (Elastic Compute Cloud) instances. You can also run other distributed frameworks such as Spark and Presto in Amazon EMR, and interact with data in other AWS data stores such as Amazon S3 (Simple Storage Service) and Amazon DynamoDB.Amazon EMR handles such big data use cases as log analysis, web indexing, data warehousing, machine learning, financial analysis, scientific simulation, and bioinformatics. More information: Amazon EMR.
p19h11t6iqttj11jf118d9802kra.jpg
4. CenturyLink
CenturyLink, the cloud services provider, has six Hadoop blueprints. But for starters, there are three configuration options (in terms of size of cluster and supported Hadoop components). There are also “add-node” blueprints for each configuration. The lowest-level configuration, called Express, is an unsupported, unmanaged Cloudera distribution deployed to a single node that allows users to “kick the tires” while still getting value from small, unstructured data sets. An Enterprise Basic option is a four-node cluster. The third option, called Enterprise Basic with HBase, extends the basic offering to include HBASE. More information: CenturyLink Hadoop cloud services.
p19h11upgpg3mm5cct486t16pb1b.jpg
5. CSC
CSC, the large integrator and MSP, offers Big Data Platform as a Service (BDPaaS). The platform leverages Apache Hadoop and features working relationships with Cloudera, among others. More information: CSC Big Data Platform as a Service.
p19h11t6iq1ncpldu24rb1gpbbc.png
6. Gold Coast IT Services (GCIT)
More of a managed services provider (MSP) than a cloud services provider (CSP), Gold Coast blends application development and consulting services – helping customers to optimize Cloudera on Amazon Web Services, for instance. As an AWS System Integrator and member of the Amazon Partner Network, the company designs and implements solutions across 21 different AWS services. More information: Gold Coast IT Services for Hadoop.
p19h11t6iqa4f1qe61q591darlc9d.jpg
7. Google Cloud Platform
With the Google Cloud Storage connector for Hadoop, you can perform MapReduce jobs directly on data in Google Cloud Storage, without copying to local disk and running Hadoop Distributed File System (HDFS). The connector simplifies Hadoop deployment, reduces cost, and provides performance comparable to HDFS, all while increasing reliability by eliminating the single point of failure of the name node, Google asserts. Hadoop on Google Cloud Platform also provides connectors that enable you to access data stored in BigQuery and Datastore, as well as Google Cloud Storage. More information: Google Cloud Platform with Hadoop.
p19h11t6ir1bpg1h4m1b561ib8s4se.jpg
8. HP Cloud with Hadoop
HP Cloud provides an elastic cloud computing and cloud storage platform to analyze and index large data volumes in the hundreds of petabytes in size, HP asserts. Distributed queries run across multiple data sets and are then returned in near real time, the company says. HP Helion Public Cloud provides the underlying infrastructure required to process big data. The company partners with third-party solution providers that enable enterprises to better configure, manage, manipulate, and analyze data affordably. More information: HP Cloud Solutions and Hadoop.
p19h11t6ir126v1q6cett16a21tmaf.jpg
9. IBM BigInsights on Cloud
IBM BigInsights on Cloud provides Hadoop-as-a-service on IBM’s SoftLayer global cloud infrastructure – a bare metal design. The service requires no on-premises infrastructure; and it supports Big SQL, Big Sheets, text analytics and more, IBM asserts. More information: IBM BigInsights on Cloud.
p19h11t6ir1dr28jfm16jed152mg.jpg
10. Microsoft Azure HDinsight
Microsoft’s Hadoop cloud service scales to petabytes on demand; processes unstructured and semi-structured data; deploys on Windows or Linux; integrates with on-premises Hadoop clusters (if needed); and supports multiple development languages including Java and .Net, Microsoft says. More information: Microsoft Azure HDinsight.
p19h11t6islir9f95v0135d3alh.jpg
11. Quobole Data Service
Hadoop as a Service appears to be Quobole’s core focus. Co-founder Ashish Thusoo previously ran Facebook’s data infrastructure team. Quobole Data Service is an on-demand elastic cluster, with nodes automatically added or removed based on data set size. Key integrations include MapReduce, Hive, Pig, Oozie, Sqoop, Spark and Presto. More information: Qubole Data Service.
p19h11t6is7ppikmnrpc431h4di.jpg
12. Rackspace Big Data
Rackspace offers several options for running Apache Hadoop. They include deploying Hadoop on Rackspace managed dedicated servers; spinning up Hadoop on Rackspace’s public cloud via virtual servers or on dedicated bare-metal cloud servers; or configuring your own private cloud. More information: Rackspace Big Data services.
p19h11t6itlirucq1eisn7s3sdj.png
13. Sahara
Promoted by Mirantis, Sahara began life as an Apache 2.0 project and is now an OpenStack integrated project, meaning it is part of the semi-annual OpenStack release. Active backers include Mirantis, Hortonworks, and Red Hat, Sahara provides push-button provisioning of mainstream Hadoop distributions and elastic data processing (EDP) capability similar to Amazon Elastic MapReduce (EMR). Check in with Mirantis to see which cloud service providers now offer Sahara-based Hadoop deployments. More information: Mirantis Sahara.
p19h11t6itlha1j601m5116ff5pk.jpg
14. Skytap
Skytap’s infrastructure offers virtual environments in which you can create, deploy, and remove Hadoop instances as needed. Users have full root-level, log-in access to configure and customize virtual machines. From there, you can create identical copies of your Hadoop environment; share your environment with other developers and users; or scale resources (CPU, memory, network) up or down. The company has a close working relationship with Cloudera. More information: Skytap and Hadoop.
p19h11t6it3uo1no21ad4g1vihcl.jpg
15. Tieto
The Northern Europe IT service provider introduced a big data PaaS platform in 2012. The effort, in partnership with Cloudera, trained 40 professionals on Hadoop. The company also offers application development, analytics, consultancy and integration services. More information: Tieto.
p19h11t6iu1u4k18jd1n9p1k9cr1im.jpg
16. Verizon Cloud
Verizon’s Enterprise business inked a Cloudera partnership in 2013, and the IT services giant now offers Cloudera atop its cloud infrastructure. A Cloudera distribution supporting billions of records can be deployed on the Verizon Cloud in a matter of hours, significantly faster than deployments on generic public clouds, Verizon claims. More information: Verizon Enterprise and Hadoop.
p19h11t6iuoo71mpj10sl1im1jsen.jpg
17. Integrated Hadoop Bundles
A range of vendors also offer integrated Hadoop solutions (including all required hardware and software) that can be deployed on-premises or in a third-party data center. Options include Avnet, Dell,Cisco, and http://www.emc.com/big-data/scale-out-storage-hadoop.htm">EMC, among many others.
p19h13crfgpdf13v91rk4lt76l6.jpg
18. Future Hadoop Cloud Options
The Hadoop-as-a-service market continues to evolve rapidly. One of the easiest ways to find and track new Hadoop cloud providers is to check in regularly with Hadoop distribution providers like Cloudera, Hortonworks and MapR. If you’re aware of an existing Hadoop cloud option that we overlooked, please post a comment with product and service information.Image: Pixabay
p19h11t6iu109pu6f19jvpeeg2io.jpg
More Info & Thank You
Thanks for visiting our site. For additional Information Management slide shows, please visit: Our Galleries.Image: Pixabay