Cornell University’s most advanced research led IT directors to a bit of their own studies on scalable and cost-efficient ways to handle growing and “painstaking” data sets.

Dozens of students and researchers at the upstate New York Ivy League school’s Institute for Biotechnology and Life Science Technologies ran unique and simultaneous DNA sequencing initiatives, some taking up one or two terabytes a piece. While not mission critical in the sense of many business data sets, the research requires high availability and analysis capabilities. On existing file management systems that capped at 8 and 16 terabytes per node, such a tast required weekly manual IT backups and retrievals, as well as limited researcher access.

“Babysitting that process and the fact it wasn’t automated was … constraining our decision-making and just extremely painstaking,” says James VanEe, IT director for the research institute.

As storage disks piled up, James VanEe, himself a Cornell grad who has been on the IT staff since 1995, searched for a cutting-edge answer to the advancing research challenge, searching for a solution that didn’t require ripping out old hardware or taking huge capital expenses.

VanEe eventually turned to its existing cluster file system provider, Red Hat. He came across the Red Hat Storage Software Appliance solution at a spring 2010 conference, and, soon after, he and Cornell’s Center for Advanced Computing were able to “kick the tires” on it by download it to older legacy hardware at the university. In the testing, VanEe found it easy to set up and run, and by sharing in the testing process, computing center students were given another real-life solution to learn on.

The Red Hat solution connects to and supports a virtualized, on-demand data pool that can scale out to petabytes, with retrieval through an index-free algorithm. It was implemented on top of the Cornell institute’s SAN infrastructure, network servers and underlying hardware – and without going through additional storage hardware purchases.

VanEe says that in the last 18 months, the increased storage has given researchers “huge amounts of scratch space” for tweaking and analyzing data sets, and it led to research connections with other public and private data sets. Availability has been constant and Red Hat system support has freed VanEe and his team to provide IT infrastructure and help desk backing on other matters with researchers.

Tom Trainer, storage product marketing manager at Red Hat, says that the unique challenges from massive research data and limited budgets are leading a growing number of universities to opt for enterprise software and services on top of commodity hardware. With this storage underpinning, organizations that may not have the dedicated budget for the cloud can still tap into support and data size flexibility found with traditional deployments, Trainer says.

“What they have essentially made is a private cloud,” he says.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access