5 Reasons Hadoop Is Kicking Can and Taking Names
Hadoop’s momentum is unstoppable as its open source roots grow wildly into enterprises. Its refreshingly unique approach to data management is transforming how companies store, process, analyze, and share big data. Forrester believes that Hadoop will become must-have infrastructure for large enterprises. If you have lots of data, there is a sweet spot for Hadoop in your organization. Here are five reasons firms should adopt Hadoop today:
1. Build a data lake with the Hadoop file system (HDFS). Firms leave potentially valuable data on the cutting-room floor. A core component of Hadoop is its distributed file system, which can store huge files and many files to scale linearly across three, 10, or 1,000 commodity nodes. Firms can use Hadoop data lakes to break down data silos across the enterprise and commingle data from CRM, ERP, clickstreams, system logs, mobile GPS, and just about any other structured or unstructured data that might contain previously undiscovered insights. Why limit yourself to wading in multiple kiddie pools when you can dive for treasure chests at the bottom of the data lake?
2. Enjoy cheap, quick processing with MapReduce. You’ve poured all of your data into the lake — now you have to process it. Hadoop MapReduce is a distributed data processing framework that brings the processing to the data in a highly parallel fashion to process and analyze data. Instead of serially reading data from files, MapReduce pushes the processing out to the individual Hadoop nodes where the data resides. The result: Large amounts of data can be processed in parallel in minutes or hours rather than in days. Now you know why Hadoop’s origins stem from monstrous data processing use cases at Google and Yahoo.
3. Data scientists can wrangle big data faster. Data scientists can find success when they run algorithms on massive amounts of data instead of on much smaller samples. Hadoop’s HDFS combined with MapReduce make it an ideal platform to run advanced analytics such as machine learning algorithms to find predictive models. There is even an Apache project called Mahout that offers a growing library of algorithms that are optimized to run on Hadoop.
4. Even the POC can make you money. Forrester has talked with many early adopters of Hadoop who use terms like “wildly successful” to describe the results of their Hadoop proof of concept (POC). Hadoop’s applicability is not limited to specific industries. Financial institutions, government, manufacturing, oil exploration, eCommerce, media all have lots of data — big data. POC’s use cases range from offloading traditional business intelligence workloads from a data warehouse to Hadoop to using advanced analytics in the data lake to predict customer behavior, find patterns in unstructured text, and decode the human genome.
5. The future of Hadoop is real-time and transactional. Hadoop is immature compared to established data management technologies, but it is a lot more mature than you think. The Hadoop open source community and commercial vendors are innovating like gangbusters to make Hadoop an enterprise staple. On October 15, Apache released Hadoop 2.x with a hearty list of new features such as YARN, which improves Hadoop’s processing efficiency and workload flexibility. In the meantime, the key commercial vendors are focusing on fast SQL access, real-time streaming, and manageability features that enterprises demand. The groundwork is being laid for an eruption in data management technologies as Hadoop sneaks its way into the transactional database market. Your adoption of Hadoop now, for analytical processing, will ensure that you are ready.
Stay tuned for the Forrester Wave of enterprise Hadoop solutions (expected publication: January 2014), in which we evaluate Hadoop solutions from 10 leading commercial vendors.
This blog post was coauthored with David Murphy, Research Associate.
This blog was originally posted at Forrester Research. Published with permission.