Over the past six weeks, OpenBI’s been a part of the just-completed Tableau 8.0 Roadshow that presented in cities across the country. If our sample of five shows is at all representative, Tableau 8.0 will assuredly be a hit. Nice timing with a pending IPO.
I’ve been a big Tableau fan over five years. For exploratory visualization that’s at the heart of both advanced analytics and data science, Tableau’s top shelf. Its intuitive, easy-to-navigate interface, powerful “small multiples” visualizations and far-reaching connectivity are nonpareil. I’ve experienced terrific performance against billion row Vectorwise tables using native database connectivity, while Tableau’s “in-memory-like” storage engine can be blazingly fast, even against 10’s of millions of extract records on a Wintel notebook.
With 8.0, Tableau’s expanding from its exploratory, client-server success base to offer extensive web browser-authoring and server deployment capabilities. The new server’s a good start; the challenge now is to enhance supporting meta-data to make the tool competitive as a production-ready, shared visual reporting environment.
Even as they’re busy serving customers and marketing roadshows, OpenBI consultants are finding time to take Coursera “classes”. This quarter’s favorite is “Introduction to Data Science” by University of Washington professor Bill Howe. From the day-after Skype water cooler discussions of previous evenings’ assignments, I’d say the course is being well received. Technologies addressed include SQL and Hadoop for data access, Python for data programming, R for statistical analysis/machine learning and Tableau for visualization. I couldn’t be happier with the curricula!
Turns out it’s good that I brushed up on Python for the Tableau Roadshow. Several course participants are learning Python as they go, so I volunteered to present at the upcoming OpenBI technology day. In preparation, I dusted off some materials I’d put together a few years back and thought I was good to go – until I started to see the advances over the last couple of releases. Now I’m busy embellishing the material, at times juxtaposing old and new ways of doing things.
Perhaps the biggest addition is the new IPython computing environment that’s comprised of interactive shells, a browser-based notebook, support for interactive visualization and array math, and tools for parallel computing -- pretty amazing. I’ve now started using IPy for programming, data analysis and graphics in Python much as I use RStudio for R. And, just as with R, one of the major benefits of Python is the bounty of work-saving libraries developed by Python’s world-wide community available for free. Advice for new Python programmers: check to see if there’s already a solution to your problem.
A frustration I have with Tableau is the absence of integration with R. Two visualization competitors, Spotfire from Tibco and Omniscope from Visokio, offer R integration, in each case significantly enhancing product functionality. At this point I can only dream of R serving as a data munging language behind Tableau, just as it now does so nicely for Omniscope. Maybe Tableau 9.0?
I also lament the halcyon days of RPy -- R integration with Python. Though apparently no longer actively maintained, RPy lets users execute R code in Python programs and promotes Python awareness of R variables and vice-versa. Continuum Analytics, the company behind the Anaconda Python distribution for large-scale data processing, predictive analytics, and scientific computing, promises to include RPy in a future distribution release. I anxiously await.
I like the core of Python, Tableau and R as a data science foundation. What’s needed is a Hadoop ecosystem that’s less noisy with the multitude of product choices – and more in synch with the needs of data scientists. I’m bullish on BDAS, the Berkeley Data Analytics Stack, consisting of Spark and Shark, as a second generation Hadoop platform focused on analytics for DS. Icing on the cake? Spark has a Python API!