The stereotypical enterprise content management (ECM) application combines elements of document management, Web content management, search and taxonomies – but that is about to change. These techniques are sufficient if you are interested in information retrieval, or simply identifying and presenting a set of articles, documents and Web pages about a particular topic to a user. In many cases, however, solving the information retrieval problem still leaves the user with an unmanageable amount of data. Many of us in ECM expend a great deal of effort developing techniques and domain-specific heuristics to improve the effectiveness of our information retrieval applications. However, these efforts will never address one fundamental and growing problem: even if we correctly retrieve only relevant content, there is still too much information for users to analyze. The next step in the evolution of ECM is the adoption of information extraction techniques which provide users with distilled information, not just documents.

Consider the problems in medical research and bioinformatics. Technical advances in experimental instruments in these fields have created vast amounts of new information that is published in scientific journals. Much of the information is available online from sources such as Medline, a database of scientific abstracts. With sophisticated search techniques, users can find the abstracts relevant to their work; but they are still left with the task of culling through those documents to find particular pieces of information, such as protein X activates Y and molecule A binds to B at location C. Information extraction techniques that identify patterns such as these allow us to create structured representations of the relationships between objects, such as proteins and genes. Once we have structured representations, we can apply many of the same analytic techniques that have been used in decision support and business intelligence, such as visualization and link analysis.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access