Just when some big data tools seem to be hitting a wall, along comes Apache Spark -- an open source platform for fast data processing. Seemingly overnight, Spark has inspired startups like Databricks and giants like IBM to promote a new approach for real-time analytics.

In reality, this isn't an overnight revolution. Apache Spark was originally developed at UC Berkeley in 2009. Several years later, Databricks was founded by the creators of Spark in 2013. But the noise has grown especially loud at this week's Spark Summit in San Francisco -- where IBM is predicting the start of another software revolution.

Steve Miller, a regular blogger here on Information Management, wrote about Spark's great promise in mid-2014, calling it a platform of choice for big data and analytics. The potential winners are data scientists and engineers within businesses that want faster analytics performance. More than 500 contributors from across 200-plus organizations have joined the effort, according to Databricks.

Corporate Tipping Point?

IBM isn't the only IT giant in this game. Infosys, Intel, Red Hat and Teradata Thinkbig -- among many other big names -- seem electrified by the Spark movement at this week's conference. Plus, more than 500 contributors from 200-plus organizations have joined the Spark development effort, according to Databricks.

The media spotlight grew extra bright over the past few weeks, as IBM described how it plans to open a San Francisco big data research center while pumping millions of dollars into R&D around "free" software -- namely, Spark.

Some pundits view Spark as a successor to Hadoop. But Hadoop has considerable momentum as a big data platform in its own right, despite a few industry growth questions in recent weeks.

But other pundits like Forrester Research see Spark and Hadoop as complementary -- one of the key themes of last week's Hadoop Summit in San Jose. That's why many startups such as BlueData are working to help customers to speed up their on-premises rollouts of both Spark and Hadoop.

Top of Mind During Summit Week

For at least this week, Spark is dominating the big data headlines -- especially as IBM and Databricks collaborate to accelerate machine learning. That move comes as Databricks releases its hosted platform -- which will let customers launch Spark clusters for big data projects.

When IBM threw its weight behind Linux more than a decade ago, the move legitimized open source servers for corporate workloads. Fast forward to the present, and IBM is trying to spark a similar revolution -- with the aptly named Spark platform.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access