BACKGROUND: Cereals are the most important food crops in the world and determining the entire genomic sequence of a model cereal, such as rice, is critical to meeting our future nutritional demands and food security needs. Rice is the single most important food crop in the world, feeding over half of the world's population. The Genomics Institute employs sophisticated data mining techniques using S-PLUS software to analyze data generated from the rice genome.

PLATFORMS: S-PLUS from MathSoft Inc. runs on Solaris 2.6.

PROBLEM SOLVED: Researchers at the Genomics Institute were interested in studying the genomic distribution of "markers," signposts in the rice genome. Discoveries are made by mining genomic data using mapping and sequencing techniques. I was interested in using powerful data analysis software to view the results of the analysis rapidly and accurately and selected S-PLUS for UNIX 5.1 because it provided powerful analytical tools and unique Trellis graphics. We benefit from superior memory resourcing with S-PLUS, allowing us to process larger data sets faster. We can preprocess our data and analyze gigabytes of data with modest computer resources.

PRODUCT FUNCTIONALITY: The product is an invaluable tool for accessing, analyzing and visualizing data. S-PLUS supports sequential processing through block reads and writes, allowing us to analyze arbitrarily large data sets. We have the tools to handle big problems, from megabytes to gigabytes.

STRENGTHS: The product makes it easy to read data from virtually any source. The comprehensive import/export capabilities reduce time spent moving data from source to source, allowing us to focus on our analysis. S-PLUS also offers a comprehensive set of traditional and modern statistical methods.

WEAKNESSES: The product does not have a GUI interface, making it necessary to learn command line language. However, it is my understanding that the company will be releasing an easy-to-use GUI in its next release of S-PLUS for UNIX 6.0.

SELECTION CRITERIA: I selected this product because I am very familiar with S-PLUS and the company's history of developing and delivering powerful analytical tools. I believe that this product has a significant advantage over other competitors due to its powerful analytics and visualization tools. I selected S-PLUS for UNIX 5.1 because the software is based on the powerful next generation object oriented language from Lucent Technologies. The S language has always been regarded as the premier language for data analysis and statistical modeling.

DELIVERABLES: The product allows us to query genomic
data and discover significant patterns. The unique Trellis graphics provide easy-to-read reports for communicating results to colleagues and industry representatives. In only one afternoon, I developed a function in S-PLUS that allows us to generate graphical representations of our physical maps of the rice genome. These images were then shared with our collaborators in the International Rice Genome Sequencing Project.

VENDOR SUPPORT: The vendor support from MathSoft has been excellent.

DOCUMENTATION: The documentation is complete and easy to read. Despite the lack of a GUI, we are easily able to perform needed analyses rapidly because the documentation is clear and makes the S-PLUS language very comprehensible.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access