IBM announced two major steps intended to assist in the open development and standardization of search and content analytics software.

The Organization for the Advancement of Structured Information Standards (OASIS) has established a technical committee to standardize the Unstructured Information Management Architecture (UIMA) specification. Additionally, the Apache Software Foundation has established an incubator project for developing UIMA-based software. These efforts are based on IBM's development of UIMA software and its experience with clients and partners in deploying content analytic solutions.

Enterprises increasingly depend on the ability to find and analyze information, of many forms and from many places. These could include call center notes, customer surveys, patent records, emails, audio recordings, images, video, blog entries and news feeds. Content analytics enables organizations to more precisely find the best information and enables them to more deeply analyze underlying context

"We're making UIMA available to the community at large with the belief that it can help accelerate innovation, collaboration, and adoption of semantic search and content analytics software," said Nelson Mattos, vice president, Information and Interaction, IBM Research. "So, the ultimate goal is to help organizations get more value out of their unstructured information by discovering relationships, identifying patterns, and predicting outcomes."

Members of the OASIS international standards consortium have formed the OASIS UIMA Technical Committee, to refine and finalize a set of specifications based on an initial contribution from IBM with input from DARPA, Carnegie Mellon University, Columbia University, Stanford University, The University of Massachusetts Amherst, MITRE Corporation and Science Applications International Corporation. Technical Committee founding members include representation from IBM, EMC, SRI International, Science Applications International Corporation, Temis, Thompson, Army Information and Intelligence Warfare Directorate, University of Sheffield and Carnegie Mellon University.

The new Apache incubator project will start with an initial contribution from IBM of the UIMA Version 2.0 source code. The Apache Software Foundation provides support for open-source software projects characterized by a collaborative, consensus based development process, an open, pragmatic software license, and a desire to create high quality software.

In addition, Carnegie Mellon University's Language Technology Institute is hosting a UIMA Component Repository web site, where developers can post information about their analytics components and anyone can find out more about free and commercially available UIMA-compliant analytics.

Additionally, free analytic tools that can work with UIMA include those from the General Architecture for Text Engineering (GATE - http://gate.ac.uk/ ) and OpenNLP (http://opennlp.sourceforge.net/) communities. Commercial analytics are available from IBM, as well as from other software vendors such as Attensity, ClearForest, Temis and Nstein.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access