I made it through the first day of the Strata-Hadoop World 2012 conference in New York City – barely.
If you’re an exhibitor like OpenBI, the days are long with booth work, conference presentations, partner meetings, interviews, receptions, etc., and the pace hectic. I started the day on an empty stomach at 7:30 AM and didn’t eat until 12:30 PM, the conference serving only coffee and water all morning. Thursday I made the adjustment, one of many “Stratans” to boost the food truck breakfast business outside the hotel. Maybe Strata can contract with the foodies next year.
The 3,000 registrants were a record for Strata. In comparison, Strata 2011 in Santa Clara had about 1,000 participants, with 1,500 in winter, 2012. This week, the hotel meeting rooms, exhibit hall and reception area were dense with data scientists, brimming but thankfully not bursting. I guess when you’re in NYC …
I liked many of the 10-15 minute keynotes, especially the ones short on vendor promotion. Photographic journalist Rick Smolan provided a timely Holiday gift solution with his elegant book, “The Human Face of Big Data.”
Brett Goldstein, Commissioner and Chief Data Officer, Department of Innovation and Technology, was given a Strata Data Innovation “Transforming Organizations” award for his work in data transparency, standardization and access with the City of Chicago.
Berkeley professor and Trifacta CEO Joe Hellerstein has lowered his research sights from the hard problems of analytics “rocket ships” to the drudgery-saving productivity of data “washing machines.” For him, the analytics life cycle is anything but linear, with human interface and data quality problems most pronounced. Not surprisingly, Trifacta is focusing on data washing machines.
Several presentations revolved on developing new generations of short-supply data scientists. Berkeley mathematician, financial quant and data scientist Cathy O’Neill, in tandem with Julie Steele from O’Reilly Media, opined in a keynote that academia is not currently aligned with the needs of data science, where the biggest challenges are determining the right questions and working with messy data. Though they believe universities remain central to DS training, O’Neill and Steele feel academia must work through its interdisciplinary politics and aggressively recruit industry experts to its faculties to become truly relevant for data science.
Mother and daughter Nokia team Amy O’Connor and Danielle Dean note that data science is an agile discipline driven by curiosity. For them, an enduring DS team must find college grads with social science, bioengineering, and math/stats, etc. educational backgrounds to mix with veteran information technology professionals. I like the oft-repeated depiction of DS that circulated at the conference: “A data scientist is someone who knows statistics better than an engineer and knows engineering better than a statistician.”
Enterprise BI stalwarts SAP, SAS and EMC mixed it up at SHW 2012 with data science darlings such as Cloudera, HortonWorks and MapR. A colleague tweeted that the enterprise guys seemed to seek acceptance from the big data crowd. If reactions to the presentations I saw from SAP and SAS are representative, I’m not sure they succeeded.
A broad range of use cases presented in the sessions highlighted just how pervasive big data problems are and how the Swiss army knife of Hadoop is helping to solve them. Indeed, a recurring message from conference vendors was the imminent decline of traditional enterprise BI architectures, to be replaced in the future by Hadoop platforms that progress from current batch engines to support BI-like ETL and speed-of-thought analytics. Provocative if a bit bombastic.
Both Cloudera CEO Mike Olson and Hadoop originator and current executive Doug Cutting were excited to announce the Apache-licensed Impala software that “enables real-time, interactive analytical queries of the data stored in HBase or HDFS.” Platfora Founder and CEO, Ben Werther, in his sponsored keynote “The End of the Data Warehouse,” doubled-down on that theme, opining that the data warehouse is broken and announcing his company’s software to transition “Apache Hadoop™ from batch engine into a subsecond-interactive, exploratory business intelligence and analytics platform designed for business analysts.”
There was also considerable excitement about YARN, the next generation of MapReduce, a more general, distributed resource manager of which today’s MR will be but one application. In his presentation "Large Scale ETL with Hadoop," Eric Sammer from Cloudera noted that "MapReduce makes simple things hard, but hard things possible." Others picked up on that euphoria, opining that YARN will ultimately promote the seamless translation of ETL from relational to Hadoop data warehouses.
If delivered to the hype, a combination of Impala and Yarn might well change the BI landscape, providing the option to move ever-more data from large-scale databases to an economical Hadoop ecosystem. I suspect this development is inevitable, though I’m not sure when. One OpenBI Hadoop consultant is convinced “you will see the data warehouse being replaced by Hadoop for some very limited applications in the next 1-3 years.”
All things considered, I thought SHW 2012 was solid. A week after the conference, participants will certainly acknowledge the good fortune of the Oct 23-25 dates in NYC. Looking forward, more the data scientist than Hadoop techie, I can’t wait for the O’Reilly Strata Conference 2013, February 26-28, 2013, in Santa Clara, California.
(Author’s note: I’d like to thank my OpenBI colleagues for their discerning contributions to this report.)