Enterprise content management (ECM) is a widespread domain that covers document management, information retrieval and portals. While these are the most widely recognized elements of ECM, a fourth thread, information extraction, is beginning to emerge.

Information extraction is the process of identifying essential pieces of information within a text, mapping them to standard forms and extracting them for use in later processing. At this point, information extraction tools work best finding the names of persons, places and things; dates and times; and monetary amounts within single documents. These elements, collectively known as named entities, are mapped to a standard form so that their relative frequency in the document can be determined. For example, a news article with "George Bush," "George W. Bush" and "Bush" would result in a named entity "George W. Bush" occurring three times. The relative frequency of these terms is then used to determine the most important named entities in the document. Because the basic operation of information extraction is looking for patterns, the same techniques can be used with a number of applications.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access