Prior to the spinoff that saw him appointed him CEO at Hortonworks last month, Eric Baldeschwieler led the evolution of Apache Hadoop from a 20 node prototype to a 42,000 node service that is behind every click at Yahoo!. He also served as technology leader for Inktomi’s Web service engine, which Yahoo! acquired in 2003.
How do you feel about the Hortonworks spinoff after being so heavily invested at Yahoo all these years?
The history is that we started working on Hadoop at Yahoo five or six years ago when it was just a prototype and 20 nodes. We built out the team and we've been focused on driving it forward for the last six years. Yahoo has built all the releases, has been the majority contributor to all the releases of Hadoop so as a team we're used to supporting a wider community anyway. The difference is, of course, we're now going to be supporting Yahoo explicitly as a customer. The key takeaways are that Hortonworks is an independent company and Yahoo is an investor, a customer and a development partner. Yahoo is maintaining a deep bench of folks who have contributed to Hadoop and who have built applications on top of it. We have more than 1,000 active users of Hadoop at Yahoo.
How arm’s length is Yahoo now as a customer?
We are providing them with Tier 3 support so Yahoo will take the ball for developer training and simple questions and even bugs that are resolvable by relatively new developers to Hadoop. We'll be backstopping them for escalations, and if they discover interesting problems that they can't fix, we'll do that.
So they are across the fence and on their own projects but will offset their costs of developing Hadoop through your work and revenues?
Yes, of course. One of the key reasons we chose to develop our big data platform in open source was the belief that over time an ecosystem would grow from that work – and that would allow Yahoo to benefit from a wider community's investment in that platform. So this is a homerun from Yahoo's perspective. They have gotten it to the point where the press is interested in Hadoop and it has got wide scale adoption in thousands of companies, or in departments in thousands of companies. As a result of that there is an opportunity for an independent company to take on the sort of key role of driving the technology forward and implementing new features and technologies around Hadoop.
You don’t have plans for an enterprise edition or “freemium” software, so what’s the business model?
First, we are committed to Apache and to open source and we think there should be a version of Hadoop that is downloadable from Apache that is complete. Our short term business model is in training and support, and strategic partners such as Yahoo, who have enough interest in seeing the technology continue to evolve in certain directions that they are willing to pay a premium to have us design and develop with them.
Is that enough of a model from a venture capital view?
Well our two investors are Yahoo and Benchmark Capital. Rob Bearden joined us from Benchmark where he was a venture partner to be COO and president, so he certainly believes this is a next big opportunity in enterprise software. We're serious when we say we believe half the world's data is going to be in Hadoop inside five years. That's the scale of opportunity we think it represents, it's going to be a huge ecosystem and we think our rent on that can be significant. The training and support will grow a significant and healthy business and that we will focus on for the short term because it's absolutely critical that we coalesce the ecosystem around the open source product and don't experience a splintering like we saw with Unix.
And the model could change down the road?
Sure, but the thing that won't change down the road is that we'll believe Hadoop and its companion projects should be a complete horizontal layer that is deployable and solves business problems. Our focus in the short term is just growing the market by making it much easier for enterprises to install and use Hadoop and making it much easier for third parties to build businesses, software businesses, OEM businesses or integration businesses around Hadoop. We think, because of our deep technical expertise, we can help bridge that gap and there's a big opportunity to do that while keeping the core free. We are committed to do that, which doesn't mean we won't potentially build products on top of Hadoop ourselves at a later date or do other things to drive monetization. But the opportunity is large, we're well funded and in a good spot to validate Hadoop, which is our mission.
There are funded Hadoop startups and businesses like DataMirror, Cloudera and MapR out there. Some use Apache, some don't, what is the Hortonworks effect on all that?
It's hard to say obviously, but we believe the great thing about open source is that it lets you partner widely. Any of those companies that commit to using Apache Hadoop and putting their improvements into Apache Hadoop, we're committed to partner with those folks. Our job is to make the pie grow bigger.
If Apache is the biggest Hadoop distribution, is it important that it be the successful one and should there be room for multiple distributions and variants?
In any healthy ecosystem there are variants, so we just want to make sure everyone knows everyone can go to Apache for a great version of Hadoop. Right now there is still some confusion and it does take a real expert to install and use Hadoop today and so you want to make it easier.
How did you decide to staff your business?
We have about 25, the core is the committers and architects that have build Hadoop, pig, Zookeeper and a couple of other key Apache Hadoop projects. So we have on the order of 80 man years of experience building Hadoop, a real strong and key differentiator for us in the ecosystem. We've got the guy who deigned and built Hadoop in the first place. Because we have the experts we can take a pure open source role just because we can partner with people because of our expertise and everybody will benefit.
How do you like the prospect of running an independent business?
I am thrilled to know I’ll be focusing full time on Hadoop, growing the market and building the technology to be everything it can be. We have the needed investment to build a collaborative relationship. We have domain expertise and experience leading communities, the deep partnership with Yahoo gives us deep access to hardware and a thousand users, but also a diversity of use cases of Hadoop. We're absolutely committed to Apache and open source and our strategy is based on training and support and partnering with third parties. Because we're all open and not doing professional services, we're in a key place to partner with lots of different parties and that’s what we are going to do.