I've been spending quite a bit of time lately working with the data.table package in R. data.table's functionality builds on R's ubiquitous data.frame to provide, according to lead developer Matt Dowle, “Fast aggregation of large data (e.g. 100 GB in RAM)......fast subset, fast grouping, fast update, fast ordered joins and list columns...and a fast file reader (fread).... in a short and flexible syntax , for faster development”.

I'm a big fan and have been since the early days, watching data.table's functionality increase dramatically over time. Four years ago, I adopted data.table as a more comprehensible “split-apply-combine” programming metaphor than R's arcane “apply” family of functions. Over time, as both the capabilities and my understanding of data.able have progressed, the package has become central to more and more of my data management work in R. And if my R network is an unbiased sample of the community, data.table's enjoying rapidly growing success.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access