IBM announced it has made new open source technology available that will enhance knowledge discovery capabilities across multiple industries and applications and provide developers with tools to support a new breed of software for the analysis of information. The company has completed the first step of making the unstructured information management architecture (UIMA) available to the open source community by publishing the UIMA source code to, the world's largest open source development site.

UIMA is an open software framework already in use by industry and academia to collaborate on the creation, development and deployment of technologies for discovering the vital knowledge present in the fastest growing sources of information today - unstructured content in the enterprise and across the Web, including documents, images, comment and note fields, email and even rich media like video and audio. New technologies built using UIMA will help unlock the value in organizations' content assets. Later this year, IBM intends to move this project to a full open source community development model.

Since unveiling UIMA in December of 2004, an active ecosystem of partners, customers and open source developers have accelerated innovation and solution delivery around UIMA.

The Mayo Clinic also adopted the UIMA framework early in its development cycle as part of its broader collaboration with IBM in the area of unstructured text processing. Mayo Clinic used UIMA as the basis for implementing a system to extract knowledge from its approximately 20 million clinical notes. This provided the flexibility to combine a series of annotators from Mayo Clinic, IBM and the open source community in a plug-and-play fashion to rapidly create a powerful analytic solution with advanced capabilities.

Memorial Sloan-Kettering Cancer Center is working with IBM to develop a Web-accessible data warehouse that will conform to HIPAA requirements. This data warehouse will enable clinicians and researchers from Memorial Sloan-Kettering Cancer Center to efficiently use data facilitating research on a new cancer taxonomy. An important aspect of the data warehouse is the inclusion of searchable concepts from Memorial Sloan-Kettering Cancer Center's text-based pathology reports. These concepts are automatically extracted by an IBM text analytics solution built on the UIMA framework.

Adding to the growing UIMA ecosystem, the General Architecture for Text Engineering (GATE - team at the University of Sheffield recently announced the delivery of an interoperability layer with UIMA. This new layer provides GATE users access to UIMA's flexible deployment options and UIMA users access to the many useful plug-ins already available in GATE for text mining, information extraction and natural language processing for research and commercial use.

UIMA has also received significant support from the Defense Advanced Research Projects Agency (DARPA) and is currently in use as part of DARPA's new human language technology research and development program called GALE (Global Autonomous Language Exploitation). The GALE Program is a five-year project involving industry leaders and 24 universities with the goal of developing and applying software technologies to absorb, analyze and interpret huge volumes of speech and text in multiple languages. UIMA has been adopted as the underlying integrating architecture for building large-scale multimodal unstructured information management applications.

In addition, several of the software vendors that previously announced plans to support UIMA have already made available their first UIMA compliant solutions, including companies such as ClearForest, Cognos, Factiva and Nstein.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access