The Enterprise Content Management (ECM) space is all about capturing, storing, managing an preserving the unstructured data, like documents, scans, video and audio. Many companies are currently struggling with the data flood that is heading right for them. As the volume of unstructured data increases and the amount of storage available to preserve it all will decrease. You will find data that is redundant, obsolete or trivial (ROT) and provides no real business value to the company. Storing this ROT data is like throwing the money in the wishing well. However if you store the non-ROT data, you need to know what the value and contents are so you can effectively apply lifecycle governance to it.

In the capture area of ECM, where the documents are scanned or imported from the network, it has always been a challenge to correctly identify documents. This is mainly because the documents were classified by their visual representation and not so much on their content. Sure, we could use machine reading, like Optical Character Recognition (OCR), and scrape metadata values from the documents. But machine reading uses predefined areas where specific metadata fields (like policy number, client number or address) could be found.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access