The URL to a pretty good article on R in the New York Times was in my email inbox early in the morning of January 7. I say only pretty good because the author didn’t adequately explain the lineage of R to the S language developed by John Chambers et. al. at Bell Labs in the ‘80s and ‘90s, and I’ve never heard anyone from the R community refer to the platform as a “supercharged version of Microsoft Excel.” I also received an email from an R support list announcing an updated release of Rattle, the R Analytical Tool to Learn Easily, an open source data mining front end for R that I’ve been investigating for the last few weeks.

Data mining or machine learning, like just about everything in R, suffers from an embarrassment of riches. Unlike other open source projects, where the lion’s share of development work is done by a few key programmers, there are hundreds of participants in R. The core team handles the base platform, but new packages offering diverse capabilities, such as statistical models, database access, graphical user interface front ends, graphics and boosting machine learning algorithms are routinely developed by the R community at large. Indeed, there are over 1,650 such packages available for download today - and the pace of new development is escalating. More often than not, the new procedures are brought to life by the very statisticians (or their students) that developed the methods and algorithms years before their availablity in the commercial, proprietary competition.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access