Open Source Business Intelligence: Then and Now
Multiple commercial open source (COS) analytics software companies have changed ownership in recent months. Last fall, reporting and OLAP vendor Jaspersoft was acquired by Tibco. More recently, comprehensive BI platform vendor Pentaho was purchased by Hitachi, and Revolution Analytics, purveyor of the commercial variant of the R Project for Statistical Computing, was bought by Microsoft.
We've come a long way since my company, Inquidia, then OpenBI, was established nine years ago as a professional service firm with a charter to implement open source BI and analytics solutions for our customers. Out of the gate, OpenBI/Inquidia enthusiastically promoted what we felt were the benefits low-cost, source code access, no vendor lock-in of OS for BI/analytics.
In the Beginning...
In the early days, open source vs proprietary seemed almost a religious struggle, with established incumbents protecting their turf by spreading fear, uncertainty and doubt (FUD) on the viability of competing COS products.
Statistical juggernaut SAS said of OS upstart R, commercialized by Revolution Analytics: “(it)...addresses a niche market for high-end data analysts that want free, readily available code. We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.” Other incendiaries that made their way to web battlefields include, "Using open-source software such as R was out of the question we couldn't guarantee a perfectly repeatable outcome" and "We would be unable to provide any support for this as it is open source software."
Initially, the value-add proffered by COS analytics revolved primarily on support the vendors would provide for their platforms. One neck to choke was particularly important to CIO's risking big dollars on software. Later, responding to tepid market reaction for the support-only models, COS vendors began to differentiate pay-for or enterprise versions of their products with additional proprietary goodies not available in the free or community editions.
Several years ago, fortunately, the hysteria surrounding open source as a software model started to wane, and COS analytics vendors evolved their marketing from open source-obsessed to more traditional feature-function-price. As did OpenBI, changing its name to Inquidia and branding as the data-driven consultancy it always was. The recent acquisitions of the three vendors validates the COS model, even if product adoption has been more modest than strategically planned.
That Reminds Me...
I was thinking about the late analytics OS wars the other day as I wrapped up an R predictive modeling engagement. Among the challenges to R I routinely experienced from proprietary-biased colleagues were low software quality, lack of adequate documentation and absence of product support. I had to chuckle as I re-thought my responses to those challenges in 2015.
It's no secret to IM readers that I'm a big fan of R, having worked with it and older cousin S+ for almost 15 years following an initial 20 with SAS. My experience with the core R engine is unambiguous: it's the highest quality, most bug-free software platform I've worked with in my career.
When first starting with R, I purchased the voluminous R Reference Manual, volumes 1 and 2. I'm sure I used them early on; now they gather dust with other 10+ year old software reference books. I also dared to use the R-help mailing list, finding concise answers to my questions worth the often snarky responses. At this point, it's simple Google queries describing the problem, with resulting tutorials, blogs, online college class materials and programmer sites like Stack Overflow providing the answers.
There's also a wealth of excellent teaching books on R programming and statistics. The winning format is an applied focus with extensive practical examples of data analysis/statistics using R code. Dalgaard's Introductory Statistics with R, as well as Hyndman and Athanasopolous's Forecasting: principles and practice, are outstanding examples of this genre.
Over the years, I've had little support need for core R, but have occasionally discovered issues with community-developed packages. Rather than the harrowing fix experience assumed by the R naysayers, however, I've found package supporters to be more than accommodating. And the support in many cases comes directly from statistical luminaries like Frank Harrell, Trevor Hastie, and Roger Koenker.
A while back I found a problem with the R data.table package on my Windows notebook. I emailed co-author Matt Dowle, who instructed me on the specifics of the offending data and code to send him. Within a week he apprised me of the fix real data scientists don't use Windows which I installed and tested. About a month later, I started a conversation with Dowle's partner Arun Srinivasan on an enhancement request. Arun emailed me a few weeks back that the enhancement was now live.
Stephen Milborrow's earth package is a staple of my predictive analytics tool chest. A few weeks ago, I discovered what appeared to be ill-behaved predictions from an earth model I'd fitted. I sent the data and code to Milborrow and within 24 hours he'd responded that I'd uncovered a rare edge case where the underlying algorithm degenerates. Contrite that I'd seen the algorithm's achilles heal, Milborrow suggested I set a parameter to maneuver around the problem. Worked like a champ.