The lack of standardized data stands as one of the largest obstacles for researchers studying cancer genes, but a recently announced grant from the National Cancer Institute of the National Institutes of Health aims to begin work on that challenge.

George Washington University researcher Raja Mazumder received a $1.2 million grant from the National Cancer Institute (NCI) to normalize cancer genomics data by compiling the data into a platform called Cancer GEM.

“Cancer detection in early-stage patients could significantly improve rates of cure,” says Natalie Abrams, program director in the NCI’s Cancer Biomarkers Research Group. “This platform integrates cancer mutation and gene expression data with patient-level data to make the resulting information accessible to the research community and thus facilitate the discovery and validation of cancer biomarkers.”

Throughout the years, much data has been collected from various genomics projects and placed on multiple different databases. This wide spectrum of databases makes it difficult to coalesce data and draw comparisons, and multiple attempts to standardize this data have not successfully applied those standards between projects.

Also See: Genomic data sharing requires standardization of lab, clinical info

The data Mazumder will use for the project, which is expected to take three years to complete, will be sourced from all known publicly available repositories and traced to information available from the National Cancer Institute’s Early Detection Research Network (EDRN) online portal, a provider of biomarker research information.

For the platform, Mazumder seeks to compile all of this data into two databases, BioMuta and BioXpress, which will enable researchers to have easier access to genomics data for their research projects.

“These two databases will be further integrated into the EDRN knowledge environment, which will enable sharing all of this information across EDRN’s cancer biomarker research network,” Mazumder says.

The BioMuta database will contain cancer mutation-related data, and the BioXpress database will contain gene expression data. Together, these databases will be “mapped to protein and amino acid position specific annotations and also relevant ontology terms, literature mined data and comparable gene expression data from other model organisms,” Mazumder says.

“This comprehensive information will provide users an information matrix which can be used to identify high-priority experimental targets,” Mazumder adds.

Also See: HDM’s Ignite: Digital Health conference, July 13-14

Using these databases will provide multiple benefits to research projects. For instance, researches will be able to examine cancer genes in an evolutionary context, helping scientists to better understand how cancer genes are expressed across various populations. It will also enable researchers to limit the data they’re looking for that would align within the boundaries of their studies. So if researchers were specifically trying to find out if a cancer gene was overexpressed within a certain age range of Caucasian women who had breast cancer, the database BioXpress would help them do that.

While this project will help many scientists in their research, it will also aid Mazumder in his own studies on cancer genes. “In our own research, we will use this collected dataset to perform pan-cancer analysis,” Mazumder says. “Such analysis will help our users to see how comprehensive datasets can enable mining of data for discoveries.”

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access

Royce Swayze

Swayze is a Dow Jones News Fund business reporting intern at Sourcemedia.