The global Hadoop market has witnessed skyrocketing adoption in the recent years and is expected to reach US $84.6 billion in revenue by 2021, equaling a stellar compound annual growth rate of 63.4 percent between 2016 and 2021, according to an Allied Market Research report.

In another recent study conducted by Syncsort, some 271 IT executives across a variety of industries were asked how they are deploying Hadoop and their expectations in terms of benefits. Their key findings include:

  • Hadoop is becoming a core part of the IT infrastructure, with legacy systems — such as mainframes and enterprise data warehouses (EDW) — cited as the prime sources for the data lake.
  • Organization are leveraging Hadoop to yield tangible results in terms of agility, operational efficiency and cost reductions.
  • Overall, confidence in Hadoop is increasing, while uncertainty in deploying Hadoop is no longer cited among the key challenges, dropping from 60 percent in 2016 to 42 percent in 2017.

The Role of Hadoop

While Hadoop is used across a multitude of use cases, nearly two-thirds of the respondents (62 percent) in the Syncsort study cite offloading data warehouse capacity and cost reductions as their prime motives. Leveraging Hadoop as a means to accomplish better and faster analytics is the rationale for 49.4 percent of all respondents. While 49.1 percent cite implementing a cost effective storage/data archive strategy as their goal, another 33.8 percent quote replacing their existing data warehouse investments as a major priority.

Compared to traditional data analysis tools such as relational database management system (RDBMS), Hadoop is seen to be more cost effective and efficient. Research suggests that the total cost of ownership (TCO) is estimated to be at around US$1,000 per terabyte — one-fifth to one-twentieth the cost of other data management technologies, according to Cloudera.

Use Cases gain Sophistication

Beyond offloading data warehouse capacity and yielding cost savings, advanced users are increasingly interested in utilizing Hadoop in a more strategic fashion, aiming to convert raw data into insights.

Six out of ten respondents (62 percent) are interested in using Hadoop for advanced/predictive analytics. While 57 percent of respondents consider utilizing Hadoop for data discovery and visualization, another 53 percent want to leverage it for ETL. 46.7 percent deploy Hadoop to gain operational insights, and 43 percent use it for real-time analytics.

Deployment Options

Though Hadoop was predominantly operated on-premise, cloud-based deployments are gaining traction, and organizations are increasingly comfortable with utilizing both. While a slight majority (51 percent) sticks solely to an on-premise deployment, nearly a third (30.6 percent) operates in a hybrid scenario. In roughly one out of five deployments (18.3 percent), organizations already take advantage of a pure cloud play.

As Hadoop gains maturity and organizations continue to explore how they can benefit from it, MapReduce remains widespread but movement across frameworks and the adoption of multiple frameworks increases. With 62 percent, MapReduce is still the most preferred framework. However, 47 percent of those currently using it plan to depart from MapReduce, while Spark adoption is on the rise.

Some 51 percent of respondents take advantage of at least one other framework besides MapReduce. A mere 25 percent of respondents rely entirely on Spark today, although that number is expected to increase toward 39 percent in the future. The vast majority will be leveraging Yet Another Resource Negotiator (YARN), which builds the foundation to run several data processing engines such as interactive SQL, real-time streaming, data science and batch processing to handle data stored in one common platform.

Implementation Challenges

Despite the huge adoption of Hadoop in the marketplace, implementing it on enterprise-grade can be challenging. The most frequently encountered challenges include:

  1. Talent shortage: Experienced talent is scarce and expensive. For the third consecutive year, acquiring new skills and tools as well as recruiting and retaining qualified talent has been among the top challenges, cited by 58 percent of respondents.
  2. Rate of change: Both the compute frameworks and tools are evolving quickly, making it difficult to keep up for 53 percent of respondents.
  3. Integration and interoperability: For 48 percent of respondents, ensuring seamless integration with other data sources and applications is a major challenge.

Outlook

As the Hadoop market gains maturity, use cases and deployment models are further evolving. While early adopters aimed primarily for cost savings, users will increasingly utilize Hadoop in a more sophisticated fashion in an effort to gain strategic insights and contribute to top-line growth. As organizations progress on their learning curve, projects will multiply, and so too will the sources and volume of data required to support them — from mainframes, to NoSQL databases, sensors, and beyond.

Going forward, the cloudification trend will further accelerate and organizations will increasingly take advantage of cloud or hybrid scenarios — particularly to enable IoT-based use cases and real-time analytics comprising a multitude of sources. Flexible tools that abstract the complexity, streamline processes and adapt to changing requirements across the various deployments will grow in popularity.

In the digital era, organizations will increasingly depend on the success of their big data projects in order to produce KPIs, along with other key metrics that will enable them to gain competitive advantages by feeding the decision-making and running their business smarter. In turn, data governance grows in importance in order to navigate across the data lake and at the same time meet more stringent compliance regulations.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access

Marc Wilczek

Marc Wilczek

Mark Wilczek is a digital strategist who blogs about issues related to technology and information management.