Information-Management.com: You’ve launched new relationships with hardware and also noted Hadoop providers Cloudera and Hortonworks. What’s the mission?
SAP's David Jonker: Customers and prospects are working with Hadoop and reaching points where they want to make something that can work within their IT environment at enterprise-class scale. Our goal was to figure out the right bundle of various solutions from our technologies, our Hadoop partners and IBM, HP and Hitachi for hardware. Cloudera and Hortonworks are arguably the leaders in Hadoop distributions and the hardware partners already resell Cloudera and Hortonworks or their platforms supporting Hadoop. They sell the hardware for HANA as well, so essentially you can go to IBM or HP and SAP and get the full package you need to deploy with enterprise-level support.
Are your customers trying to build hybrid environments and how are they trying to use Hadoop in their existing architecture?
At the end of the day most of the people dealing with quote/unquote big data problems are actually dealing with transactional information and Hadoop doesn’t make sense for that environment. So they might choose to use HANA or Sybase IQ or some combination to figure out how to scale to a particular problem in that realm. At one end of the scale, all the talk about Hadoop putting an end to the enterprise data warehouse just doesn’t make sense when you consider that companies continue to invest heavily in their data warehousing architecture and data strategy for a reason. No large enterprise is going to report their earnings using Hadoop. They are not going to say we closed $3 billion this last quarter with 80 percent confidence and our expenses were about $2 billion give or take 10 percent. Data warehousing is a practice that’s figured out how to report on the facts. It’s also clear that advanced analytics and data warehousing can also be complementary for different uses.
How do you leverage the mix?
Without throwing out anything you’ve put in, data warehousing has its own big data kind of problem. The BI analyst wants to report on more information than the company uses for earnings. I talk to lots of analysts who have been waiting or struggling to do that. Your current IT infrastructure can’t handle storing more historical information and report against it. That is the problem that most of the people I see are running into and analytical databases can solve it. That’s where we recommend HANA and Sybase IQ.
That’s still not the exploratory scope of Hadoop though, so there’s some kind of cutoff point.
Right, lots of people at the same companies also want to explore and analyze Web logs, take social media data, make sense of it and then combine it with the data in their data warehouse. Hadoop does some of that stuff particularly well. One of the things it does really well is act like a kind of data refinery. Comscore is a great example. They are a service that collects data from millions of people doing things online, they are generating terabytes of data every day. Out of these voluminous Web logs there are only small nuggets of data that clients are really interested in. Hadoop does a good job of preprocessing that kind of like a refinery or an ETL sort of process. From there you load your analytic database. That’s what the companies we talk to often get around to eventually.
Do you see more direct analysis migrating to Hadoop over time as a lot of vendors are promoting?
Longer term, we think as companies are more comfortable and start to deal with more with the Internet of different things, maybe it’s making sense of a whole lot of machine data, it’ll no longer make sense to jam all of that into a relational format. It will call for different analysis. But I don’t think the analytic database is going anywhere because companies have invested and gotten too much from that to turn back.