40 Vendors We're Watching 2012: Big Data
To learn more about how the companies were selected and to view the list of all 40 vendors, please click here.
What: Managed cloud database
Why: NoSQL thats distributed, replicated, managed by experts, REST API full-text search and analytics on globally scalable ApacheCouchDB. Besides migrating your marketing analytics database to these MIT guys, they sound like the type youd trust to feed your dog while youre away.
Where: Boston, MA
Of Note: Answers the question, so, where do you go from here? for three MIT physicists who held a stint managing data at the Large Hadron Collider. Scale is what they live for. NoSQL data layer for Windows Azure and a gig hosting online game player data.
What: Data storage and processing services on Apache Hadoop
Why: For big data users that are ready for enterprise-level security, integration and infrastructure and a variety of subscription-based services. Backers have made it rain in this cloud for the last three years.
Where: Palo Alto, CA
Of Note: Founder and CEO Mike Olson is among the most visible and outspoken visionaries and advocates of Hadoop. Clouderas platform was part of the big data appliance Oracle launch this year, and has been seen hanging out with Pentaho, HP and IBM. Has its own certification university for Hadoop training. Customers include eBay, Groupon, Morgan Stanley, Nokia and Qualcomm.
What: NoSQL Cassandra Hadoop
Why: Okay, were sensing a theme. DataStax brings an array of open source products to Cassandra, a scalable NoSQL database for real-time big data workloads across multiple nodes. With a workhorse of a download platform (you pay for support and consulting), our source tells us DataStax will become a household name in BI. (Is there such a thing as a household name in BI?)
Where: San Mateo, CA
Of Note: Customers include lots of service providers including eBay, GoDaddy, LivePerson, and Netflix uses Apache Cassandra to minimize downtime and outages.
What: In-memory Java middleware for big data
Why: In-Memory = real time. This high performance Java middleware can start with less than 10 lines of code (they print the worlds shortest MapReduce app on the back of their business cards) to build enterprise e-commerce platforms, hyperlocal advertising, global gaming platforms and more.
Where: Foster City, CA
Of Note: GridGain carries clout, counting some of the largest companies in the world as customers, such as Apple, Canon and Sony.
What: Big data analytics
Why: Combining relational database technology with Hadoop into a single system, Hadapt produces cloud-based big data analytics. Data stored in Hadapt can be accessed using existing SQL-based tools and SQL queries can be performed significantly faster than using Hadoop+Hive.
Where: Cambridge, MA
Of Note: Hadapt made Gartners 2012 Cool Vendors in Information Infrastructure and Big Data. MassTLC named Hadapt one of the Innovative Technology of the Year for Big Data.
What: Enterprise big data platform on open source Apache Hadoop
Why: Big in the big data space right off the bat, Hortonworks with engineers and financing from Yahoo!, you could say these folks wrote the book on enterprise use of Hadoop because they did, a lot of it anyway. With a year under their belt, Hortonworks gets high marks from analysts on cluster monitoring and metadata sharing across systems.
Where: Sunnyvale, CA
Of Note: In a tight big data market that often confuses the C-suite set, Hortonworks turned some heads at Hadoop World and Strata conferences in the past year; high-profile rollouts and Yahoo! connections no doubt aided from big data relationships with Teradata, Microsoft and others. Like other providers in the competitive democracy that is Hadoop, Hortonworks has its own certification courses and a legion of developers in its virtual sandbox.
What: Enterprise-scale Apache Hadoop distribution
Why: Claims no single point of failure or downtime and full data protection. No shortage of community coding language contributions from MapR, and its winning commercial converts of late on those SLAs. Its early to name any knock-out winners in enterprise Hadoop, but clearly stated use cases across multiple industries and a mantra of reliability cant hurt their case.
Where: San Jose, CA
Of Note: Two editions, turnkey solutions for private and multitenant, can run on AWS and Google Compute Engine; 451 Group called them the clear choice for Hadoop.
What: Machine learning for big data analytics
Why: Machine learning is one clear path to big data processing. Use cases for clustering/segmentation, outliers, predictive analytics, similarity search. Automation and low entry point make it a low-risk bet.
Where: San Jose, CA
Of Note: More than four decades of scalable machine learning experience at environments including the Large Hadron Collider, NASA and the Sloan Digital Sky Survey. Advisory Board includes Pat Hanrahan (Stanford and co-founder of Pixar & Tableau) and Michael Jordan (UC Berkeleys top machine learning expert).
Note: This entry was corrected 10/9 and eliminated references to natural language processing, which Skytree presently does not offer.
What: IT and machine analytics
Why: Because machine data is churning and were still figuring out what we need and what to do with it. Splunks ROI comes from reducing IT downtime, cutting legacy cost, supporting revenue-generating IT, reducing fraud, enforcing SLAs with business and compliance risk insights.
Where: San Francisco, CA
Of Note: Management team built from potential machine life forms recruited at Disney, Apple, Oracle, Microsoft, Autodesk, Infoseek, Informix and SAP. Seriously though, a shelf full of awards and a Best Place to Work in the Bay Area award from the San Francisco Business Times and San Jose/Silicon Valley Business Journal.
Information Managements 40 Vendors Were Watching 2012 is a list of up and coming vendors on our radar that are doing their part to shape the groundswell in information management technology in the 21st Century.
As our editors and advisers reviewed the strengths and of each of these companies, we determined that five main themes were apparent: analytics/visualization, big data, business intelligence, database, integration/governance. Our big data category features startup vendors developing open source frameworks for working with large volumes of data for processing and analysis. [Note: Some vendors cross multiple categories.]