This is an article from the June 2006 issue of DM Review's Extended Edition. Click on this link for more information on DMR Extended Edition or to download this issue in a PDF format.

William wishes to thank Vinay Balasubramanian, president and owner of Mercury Consulting, for his contributions to this month's column.

Data warehousing has found its way into numerous niche industries. It is, simply put, the most elegant, effective and efficient manner for the collection, management and distribution of data within organizations today.

One industry in which data warehousing is making strides is nutritional genomics, or nutrigenomics, the study of how foods affect our genes and how individual genetic differences can impact the way we respond to nutrients and other naturally occurring compounds in the foods we eat. Nutrigenomics has received much attention recently because of its potential for preventing, mitigating or treating chronic disease and certain cancers through small but highly informative dietary changes. quotes Guy Miller of Galileo Pharmaceuticals, Inc., a biotech company working on cell-based therapeutic nutritionals as saying, "Nutrigenomics will revolutionize wellness and disease management. One driving force for nutrigenomics will be cost savings realized by consumers, employers, government and third-party providers through retarding and preventing disease. We are embarking on a new era to deliver to consumers exciting technologies to enable wellness."

It is highly possible that, in the near future, you will be able to accurately understand your disease susceptibilities based on genetic testing. The result will be a list of foods to consume and avoid as well as dietary supplements optimized for your genetics.

Subindustries focused on the importance of data in nutrigenomics have formed. Bioinformatics is a term used for data capture, management and retrieval of high-dimensional data sets in nutrigenomics. Identifying the positive and negative connections between the common constituents of our diet with genetic determinants of health and disease (as influenced by environmental factors) makes nutritional genomics a problem with a high number of dimensions. Biocomputation is a term used for the data analysis of nutrigenomics data.

Extensive use of microarray, a tool for analyzing gene expression that consists of a small membrane or glass slide containing samples of many genes arranged in a regular pattern, and mass spectrometry, a powerful analytical technique that is used to identify unknown compounds, to quantify known compounds, and to elucidate the structure and chemical properties of molecules, has stimulated bioinformatic work in data acquisition, signal and image processing, and data mining. Warehousing this data and using specialized analytic tools is the next logical step to make meaningful analytic decisions because nutrigenomics data sets are large, complex and nonlinear.

Nutrigenomics data warehouse systems coupled with interactive applications can present critical information to organizations and their audiences. Warehousing diverse sets of data may seem challenging to biologists, just as understanding the terminology and science behind nutrigenomics may seem challenging to the data warehouse technologist. Though all industries face this semantic gap, nutrigenomics will need to address it with a well-rounded team that includes research scientists.

The architecture to accommodate large volumes of data must be taken into account from the onset, and technology decisions made appropriately. Single multiprocessor (SMP) architectures are quite adequate when used with an appropriate database partitioning strategy to handle data volumes.

Another example of industry recognition of data's importance is in the formation of the bioinformatics shared resource core (BSRC) center. The BSRC has developed a number of computational tools in Matlab for analyzing gene expression (microarray), protein structure and function (proteomics), single nucleotide polymorphism data analysis, and classifying and visualizing that data. Visualization tools have been developed for three-dimensional image generation and animation of the three-dimensional models. The BSRC uses the R statistical environment and Bioconductor libraries installed on its servers with clustering and visualization tools.

Information obtained from a nutrigenomics data warehouse can be employed in a variety of ways. For example, organizations can devise genome-based nutritional interventions to prevent, delay and treat diseases. Food formulations can be improved to create best-selling or nutritious lines or brands. Organizations can build interactive applications on top of the nutrigenomics warehouse and market to consumers to take a personalized quiz, then present information that can potentially make the consumer their client.

A nutrigenomics data warehouse can do more than just organize research information. It can help organizations adapt in the future to deliver products based on new applications of genomic tools, thus preparing them to welcome the new frontier in the postgenomic era.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access