Two months ago, a half dozen OpenBI staff had the opportunity to escape the harsh Chicago winter for a week, one group heading to Strata in Santa Clara, the other to the Tableau Partner Conference in Las Vegas. Both teams returned excited about data science software they'd seen on their trips: for the Strata crew, it was BDAS, the Berkeley Data Analytics Stack, while for the Tableau group, it was the Alteryx, which provides “an intuitive workflow for data blending and advanced analytics.”
A few weeks later, the OpenBI team got the full-Monty Alteryx demo, adroitly orchestrated by solutions engineer Damian Austin. I was impressed with what I saw over the two hours, envisioning the data munging capabilities along with the strong Tableau and R integration of Alteryx a potential OpenBI “analytics workbench” that can handle much of the data programming, analytics and visualization tasks presented by our customers. So I set out to take Alteryx for a test ride to determine what it could do.
My OpenBI colleagues laugh when they hear I'm doing a technical “proof of concept”, the joke being I have only two such examples in my arsenal. And they're about right. For the first test, I dusted of my trusty Russell stock portfolio data to put Alteryx-Tableau-R through the paces of a “split-apply-combine” challenge that's pervasive in the analytics world.
The data set consists of portfolio name, date, and two end-of-day indexes for the daily levels of 21 portfolios over almost 19 years a total of over 95,000 individual portfolio-date records. The intent is to compare the relative performance of the individual portfolios over time.
The raw index levels, unfortunately, are on different scales, so my first task is to create two new index variables for each portfolio normalized to begin at 1, thus making growth from that point comparable.
I also compute daily percent changes in the index levels for each portfolio, finally outputting two data sets, a “vertical” one that looks like the input with the addition of the new variables, and a “pivoted” one that has the portfolio daily percent change calculations as columns. These files are grist for subsequent analytics and visualization exercises. My hope was that this intermediate-level of difficulty exercise is as straightforward in Alteryx-Tableau-R as it is in both R and Python/Pandas.
I had no trouble getting Alteryx working, though I was disappointed the install procedure insisted on a new R instance rather than simply recognizing the version already running on my notebook. I had success testing many of the different flow steps and linking them together coherently. However, I was stumped on how to best implement the split-apply-combine metaphor in Alteryx. The optimal approach became apparent after a few hours of frustration led me to an “ah-ha” moment: contact Damian, the expert, and let him do the work. So I wrote Damian an email explaining my desires and included the data just in case he wanted to try a few things. Lo and behold, several hours later he responded with a 90% solution to my puzzle.
Damian and I agreed to a working session to further elaborate a solution. There, we (he) cleanly laid out a flow solution to my challenge. Two critical steps involved “splitting” the vertical data set by portfolio and computing the daily percent change variables. The trick for splitting the input is to “sample” the first record of each portfolio into a data step, then join that data back to the original. The over-time percent change calculation use the multi-row formula that can access prior or subsequent records.
Not only did we quickly accomplish what we planned, but Damian added an “extra credit” Step 0 to first download and munge the relevant data from the Russell web site, rather than assume a clean CSV as input. And as a data scientist smitten with R and Tableau, I was readily able to tack on several slick R analytics and Tableau visualization steps to the flows. Indeed, writing Tableau tde files and updating Tableau workbooks with new data is straightforward, as is using Alteryx's pre-fabbed R steps and developing one's own R data management/modeling code. Nice.
I'm continuing to experiment with Alteryx and have started progressing with my census data exercise. All told a good start for Alteryx-Tableau-R as an analytics workbench. Though there's still a lot of testing for me (and Damian!) to do that I'll report in subsequent blogs, I like what I'm seeing so far.