FEB 23, 2012 8:55am ET

Related Links

CIOs Not Always Engaged on Cloud, Analytics, Social
May 21, 2012
Rochester University Students Take Watson to School on Weather, Travel and Mining
May 17, 2012
Gartner Lists 10 Disruptive Technologies for Business Information Management
May 16, 2012

Web Seminars

The Big Deal About Big Data Governance
May 22, 2012
Treating Big Data Performance Woes with the Data Replication Cure
May 23, 2012
The Role of Data Virtualization in a World of Big Data
June 6, 2012
CASE STUDY

At Cornell, Researching a Better Data System

Print
Reprints
Email

Cornell University’s most advanced research led IT directors to a bit of their own studies on scalable and cost-efficient ways to handle growing and “painstaking” data sets.

Dozens of students and researchers at the upstate New York Ivy League school’s Institute for Biotechnology and Life Science Technologies ran unique and simultaneous DNA sequencing initiatives, some taking up one or two terabytes a piece. While not mission critical in the sense of many business data sets, the research requires high availability and analysis capabilities. On existing file management systems that capped at 8 and 16 terabytes per node, such a tast required weekly manual IT backups and retrievals, as well as limited researcher access.

“Babysitting that process and the fact it wasn’t automated was … constraining our decision-making and just extremely painstaking,” says James VanEe, IT director for the research institute.

As storage disks piled up, James VanEe, himself a Cornell grad who has been on the IT staff since 1995, searched for a cutting-edge answer to the advancing research challenge, searching for a solution that didn’t require ripping out old hardware or taking huge capital expenses.

VanEe eventually turned to its existing cluster file system provider, Red Hat. He came across the Red Hat Storage Software Appliance solution at a spring 2010 conference, and, soon after, he and Cornell’s Center for Advanced Computing were able to “kick the tires” on it by download it to older legacy hardware at the university. In the testing, VanEe found it easy to set up and run, and by sharing in the testing process, computing center students were given another real-life solution to learn on.

The Red Hat solution connects to and supports a virtualized, on-demand data pool that can scale out to petabytes, with retrieval through an index-free algorithm. It was implemented on top of the Cornell institute’s SAN infrastructure, network servers and underlying hardware – and without going through additional storage hardware purchases.

VanEe says that in the last 18 months, the increased storage has given researchers “huge amounts of scratch space” for tweaking and analyzing data sets, and it led to research connections with other public and private data sets. Availability has been constant and Red Hat system support has freed VanEe and his team to provide IT infrastructure and help desk backing on other matters with researchers.

Tom Trainer, storage product marketing manager at Red Hat, says that the unique challenges from massive research data and limited budgets are leading a growing number of universities to opt for enterprise software and services on top of commodity hardware. With this storage underpinning, organizations that may not have the dedicated budget for the cloud can still tap into support and data size flexibility found with traditional deployments, Trainer says.

“What they have essentially made is a private cloud,” he says.

Justin Kern is senior editor at Information Management and can be reached at justin.kern@sourcemedia.com. Follow him on Twitter at @IMJustinKern.

Advertisement

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.