The R Community
When I returned from the annual August family vacation in the Outer Banks, I prepared for a major email backlog that I'd have to sort through. Though dreading the specter of responding to hundreds of messages, I did look forward to logging on to my open source help list account and catching up with the R community. The 660 messages that had accumulated from the various R groups didn't disappoint. I spent four hours over the next two days scouring the messages and enjoyed every minute of my labor, learning a lot about statistics, programming, and the workings of an open source community in the process.
Having been an active participant on the R help lists for several years now, I am able to make a quick first pass through accumulated messages, prioritizing for personal interests, education and amusement. Topics such as graphics, database connectivity, programming puzzles, predictive models, new package availability, python integration and financial engineering are tagged for early review. Of the world-wide R community, there are probably a couple of dozen list authors who are so knowledgeable that I prioritize their messages regardless of content, privileged for the opportunity to learn intimately from respected experts. Then there is the always-entertaining community policing and control, where "newbies" are taught the norms of list participation. Indeed the community publishes a posting guide that prescribes the method for posing questions. Identifying oneself as a newbie generally buys some slack, but there are limits. Be sure to choose the proper list for the question; don't, for example, send R-sig-DB questions to R-help. Do your homework first; it's not good form to ask a trivial question readily answered through online help. Nor is it proper to vaguely frame a question without sample code, data or mention of R version or platform. And woe to the unwitting dupe who disparages a language or package feature through ignorance. I've gotten pretty good at predicting which notes will be torched - just by reading the first sentence of the message.
I respond to list questions as I'm able, though a request just an hour old from Europe may already be answered by the time I craft my note. A relatively simple inquiry might get half a dozen responses within minutes, each unaware of the others. There's a pecking order to both questions and respondents, indicative of the many levels of R and subject expertise. It is testimony to the commitment of the R list community that participation by leading experts is so pervasive. These experts generally pick their spots, offering support as their skills are needed. For the most part, questions are addressed by the least experienced qualified respondents, promoting participation from apprentice as well as seasoned users. A simple question will be answered by someone like me, not a distinguished professor or researcher. That distinguished professor, however, might take on language and statistical esoterica out of the reach of most, or adjudicate a lively discussion on an issue of list disagreement. Once the distinguished opine, the discussion thread generally ends. Occasionally there are illuminating messages from academics on packages they've developed for their latest statistical procedures. Programming puzzle questions sometimes become competitions where, within the domain of correct responses, terseness wins. I'm generally wary of submitting a programming solution, lest someone respond to my note with, "Yes, but simpler is ..." or "Why not use such and such function?" It somewhat reminds me of the old APL mentality - one program, one line of code.
The R Platform
So just what is this phenomenon called R with such a talented and entertaining community? As described on its Web site, R is:
a "language and environment for statistical computing and graphics. It consists of a language plus a run-time environment with graphics, a debugger, access to certain system functions and the ability to run programs stored in script files."
In the statistical world, R is differentiated as an extensible, object-oriented language modeled on Scheme and S from Bell Labs. R is also open source, available under the GNU licensing agreement. As such, source code is readily available and modifiable, and R is free to use, within the boundaries of GNU. R was originally developed by two researchers at the University of Auckland in New Zealand (both of whose first names start with R), and is now maintained by a world-wide group of core developers. There are ports of R for Windows, Linux, UNIX, and Mac OS, with source code available for all, and binaries downloadable for Windows and Mac OS. R's syntax is quite similar to that of S which is commercialized as S-Plus by Insightful Corporation. As the senior product, more has been written to date on S than R, though R specialty books are starting to proliferate. Fortunately, much of the code written for S works with R, despite the different internals of the two languages. Several seminal statistical texts include both R and S code, noting differences as appropriate. Alas, commercial S-Plus, an excellent product, continues to struggle in the marketplace, even as R's popularity soars. I wonder what that suggests about open source versus proprietary software?
Extensive documentation of R's workings is available from the R Web site as either downloadable PDF files or html. An R newsletter offering both practical and arcane insights is published, though is not for the faint of heart. UseR! is a biennial international R user conference, the last two of which have been held in Vienna, Austria. The R community provides numerous help and special interest group mailing lists to support its users. I subscribe to R-help, R-packages, R-sig-DB, R-sig-finance, R-sig-GUI and R-sig-Wiki. New users can enroll through the main R page. A promising R Wiki has been established and provides collaborative, community documentation outside the formal R manuals as well as links about R, tips, reference cards, galleries of graphics, extensive code samples, etc. R-based projects substantial enough to exist independently have been established and are accessible from the R Web site. Bioconductor, Bioinformatics with R, and Rmetrics (a project for financial engineering) are examples. Finally, search consists of a series of engines to help the R community locate pertinent R Web pages and mail archives.