As the enterprise Hadoop market continues to mature and many companies deploy their clusters for the most demanding analytical challenges, data scientists will begin to migrate toward this new, open source-centric platform. At the same time, enterprise adoption of the open-source R language will grow in 2012 and beyond, and we’ll see greater industry convergence between Hadoop and R, especially as analytics tool vendors integrate both technologies tightly into their offerings. We will also see increasing adoption of open-source data integration tools, such as those commercialized by Talend and others, and of open-source BI tools, from Pentaho, Jaspersoft, and others.
This is happening for the following reasons:
- Open-source initiatives are transforming all platforms and tools. It’s happening because open-source infrastructure, platforms, tools, and applications—such as Linux, Apache, Eclipse, Python, Mozilla, and Android—have gained widespread adoption in many sectors of the IT world, due to advantages such as no-cost licensing, extensibility, and vibrant communities.
- Open-source communities are where the fresh action is. It’s happening because open-source communities have fostered innovative new approaches and ecosystems, increasingly gaining a jump on the incumbent providers of proprietary, closed-source—albeit feature-rich and robust—offerings in advanced analytics, data warehousing, and integration tools.
- Open-source solutions and providers are maturing rapidly. And it’s happening because a new generation of IT professionals realizes they can now obtain open-source data/analytics products from a growing range of vendors, both startups and incumbents, who offer out-of-the-box integration with customers’ legacy IT and also provide strong service, support, and consulting services. Open-source data/analytics products are no longer the risky bet they were just a year or two ago.
Recognizing this trend, and seeing the speed at which incumbent vendors are incorporating open-source technologies into their solutions, Forrester regards Hadoop, for example, as the nucleus of the next-generation enterprise data warehouse (EDW) in the cloud, and R as a key codebase in the coming wave of integrated Big Data development tools. We also expect various open-source NoSQL databases and tools to coalesce into rich alternatives to closed-source content analytics offerings.
As the footprint of closed-source software shrinks in many data/analytics environments, many incumbent vendors will evolve their business models toward open-source approaches, and also ramp up professional services and systems integration to assist customers in their moves towards open-source cloud-oriented analytics, much of it focused on Hadoop and R. Furthermore, we’ll see a fair number of open-source data/analytics tool, platform, and application vendors join forces through mergers and acquisitions.
Just as important, we expect a growing range of next-generation Big Data development tools to plug into extensible open-source platforms geared to boosting the collective productivity of teams of data scientists and subject-matter experts. It’s with this last trend in mind that we laud EMC Greenplum’s recent announcement that it is open-sourcing its new Chorus “social” framework for Big Data development.
Just as the platforms and tools open up, Big Data’s development ecosystem will as well. Big Data will leverage the most open arena of all, “crowdsourcing” cloud approaches such as Kaggle, to pool the world’s expertise (or at least that of all the smart people in your company and/or value chain) in wide-ranging development, investigation, and exploration of analytics- and data-infused business problems from all conceivable angles.
This blog originally appeared at Forrester Research.