Super Bowl weekend was especially dreary in Chicagoland, the victim of a polar vortex winter for the ages. Between the Friday night/Saturday morning snowfall and the bitter cold that followed, my wife and I were pretty much confined to our suburban home for two days, canceling plans with friends in the city. But I resolved while snow-blowing that I wouldn't let the elements win, that I'd at least have a productive, if unexciting, weekend. And so it was I decided to watch sports on TV throughout college basketball all day Saturday and Sunday afternoon, climaxing in the Super Bowl Sunday evening at the same time starting to learn some new analytics software on my computer.
That software turned out to be the hot programming language, Julia, from MIT. I've been seeing more and more Julia on the analytics forums and noticed the Julia citation in a comment to a recent blog. But it was the slide deck Julia for R Programmers by University of Wisconsin Statistics professor and respected R elder Douglas Bates that really caught my attention.
Even after dedicating much of his career to the care and feeding of R, Bates, unlike many R aficionados, is even-handed in his evaluation of the platform's strengths and weaknesses and there are plenty of each. He's also bullish on Julia, noting it's mathematical/statistical focus with compiled language performance. Julia provides a dynamic, interactive development environment, while supporting generic functions, multiple dispatch and parallelization. And it's open source with syntax not unlike R or Matlab. For stats geeks, this is having your programming cake and eating it too. I was ready to go.
Saturday and Sunday were quite productive days. I installed Julia and Julia Studio. I “hello worlded” and successfully converted several Python and R code snippets to Julia. I wrote my first functions, learned about the package system for outside libraries and installed/got running Datetime, ODBC and DataFrame essential for data analysis work. I was able to read a file and parse its string contents. I was also able to partially reproduce results of a “by group” Python script I've ported over the years. And, finally, I discovered the iPython notebook package for Julia, so I could work in my beloved interactive environment. I was feeling pretty puffy by the time I turned off the computer to watch the Super Bowl. Life was good.
Alas, the string of luck didn't continue when I picked up the work stream Monday night. My self-assigned task was to reproduce in Julia the data management scripts I'd written in R/Python and reported in the Dueling R and Python blog. But while I continue to progress each time I start with Julia anew, it's slower sledding now. Why? Because Julia's new and its web “support” lags. Python and R, in contrast, have been around a long time, have enjoyed major success and have built substantial user bases. That ecosystem ubiquity drives both the quantity and quality of available platform information. With wider adoption comes better support, more documentation and hence easier startup learning.
I snicker every time I hear R-bashers decry the platform's “lack of” documentation. Seven years ago, I bought the two-volume 1400 page R Reference Manual that sits on my bookshelf unused as the day Fed-Ex delivered it from Amazon. When I have an R question, I simply formulate and Google and then sift through the welter of pertinent links. Often as not, the big work's in deciding which among the competing sources to use. When I have a specific, language-related question, Stack Overflow is my trusty guide. Same goes for Python/pandas, though I have to shout out the terrific pandas documentation.
Right now, that approach works not nearly as well with Julia. The language guide's complete, but there aren't many tutorials or other nice learning tools yet available. Unless you count books by authors Julia Case Bradley and Julia Lerman, my “Julia programming language” query on Amazon yielded nothing. And attempts to find pertinent code snippets haven't been nearly as fruitful as needed.
Tuesday morning I had a call with a prospect who wanted to discuss OpenBI's capabilities in statistical forecasting and time to failure analysis. I've done work with both in the past and have several books on the subjects lost somewhere in my office. Before I was even off the call, though, I'd identified on the web several terrific sources of each technique along with detailed R code. It'd be tough to replicate that facility with Julia right now.
I'll continue to learn Julia in my spare time. With all its CS positives, I certainly see it as an analytics player in the not-too-distant future. But for statistical and data analysis production work now, I'll stick with established platforms Python and R.