This is the fourth in a series of articles discussing various aspects of unstructured data.

While the Web is widely recognized as an amazing source of unstructured data, much of this data is rather difficult to search and navigate. In fact, the Web was organized along a model meant for human consumption, not for optimizing machine searches. HTML (hypertext markup language), the means by which hypertextual information is organized for the Web, is a presentation language. It concerns itself with the appearance of data rather than its underlying structure. This makes the process of extracting content from the Web a daunting task for automatic processors. For this reason, there's a widespread sense that unstructured data on the Web represents a great untapped value.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access