A clear view to customers, suppliers, regulatory requirements and patients requires access to all data - structured and unstructured. It is estimated that more than 85 percent of all business information exists as unstructured data, commonly appearing in emails, memos, reports, letters, presentations and Web pages.1
Organizations are buried in unstructured content. But unstructured data does not mean irrelevant or lacking business intelligence (BI) value. Quite the contrary, this data describes much of your business activity, providing important insights about customers' habits, tastes, product use, employee work habits and business process efficiencies and/or failures.

Figure 1
Unfortunately, the structured and unstructured areas of data analysis have historically been separated by technology, technique and staff expertise. Typical analysis of unstructured data has been limited to search tools that locate documents stored in file-based servers (such as Web servers, document management servers, etc). In comparison, structured data analysis utilizes BI tools for query and reporting or slicing and dicing business activity stored in relational database management systems (RDBMSs). The staff trained in BI techniques and technologies, moreover, is typically not skilled in the linguistic and other specialized techniques required for analyzing unstructured content. Consequently, unstructured data analysis is rarely attempted by the BI team.
What is needed is a means to converge the two areas of analysis. When unstructured and structured data are blended for analysis, decision-makers are armed with comprehensive insight in order to drive the prescriptions they apply to improve business operations, including:
- Automatically identifying top issues in call center logs (unstructured) and proactively routing calls to the right person based on the issue can save millions through reduced call time, not to mention improved customer service.
- Rapidly detecting emerging product trends in problem reports (unstructured) coming in from all over the globe can avoid recalls and lawsuits, potentially saving companies millions of dollars.
- Analyzing patient comments (unstructured), doctor notes (unstructured) and symptom data can lead to better disease management and identification of new uses for drugs.
- Capitalizing on customer feedback (unstructured) following a product launch can help adjust marketing campaigns months ahead of competitors.
- Reducing hundreds of boxes of documents (unstructured) down to the two that are relevant as part of the legal discovery process reveals previously hidden information in less time than if all documents were read by human beings, which focuses critical resources on higher value tasks.
- Automatically mining thousands of SEC reports (unstructured) to predict poor corporate governance can help identify issues before they turn into major crises.
The Evolution of Unstructured Analytics
Text processing and unstructured data analysis have evolved over time. The underlying technologies continue to improve, just recently achieving a level maturity to support the types of analysis previously discussed.
First Generation: Keyword Search
The first generation text analysis technologies afforded "search" capability. Keyword search is conducted to help a user find documents containing words and concepts described by the keywords. While great for retrieving and grouping keywords within documents, these tools have many well-known problems that make them impractical to use for unstructured analytics.
- These tools are unable to track or quantify the evolution of ideas or the changes in activity levels of tracked people, processes or organizations that may be searched.
- Search tools were designed to be easy to use, which restricts their analysis capability to simple Boolean (not/and/or) expressions.
- Although great at rapidly returning documents, a user still must take the time to read through the returned documents to extract meaning from them.
- A search tool relies on the user to identify the right combination of keywords to extract the desired information.
As a result of these limitations, search applications typically require a great deal of manual effort to sift through documents and connect bits and pieces of information to make decisions from unstructured data. For example, many law firms hire paralegals or junior lawyers to manually sift through documents using search interfaces during a discovery process to tag those that are relevant.
Second Generation: Point Text Analysis
The limitations of the keyword search applications led to a second generation technology, point text analysis. These tools solve a variety of problems related to understanding the meaning of a document. They can scan a text document, for example, pulling out names, or identifying events, locations, products, opinions about products, problems, methods, etc. Vendors refer to their products as "entity extraction, "concept extraction" or "name matching" products. And while they are valuable at helping users to resolve documents, the technologies all tend to drive stovepipe solutions in that they solve a specific problem or work in a specific functional area of the business.
The Next Generation: Content Mining Platform
As organizations adopt analytical approaches to unstructured data, they will need to address a number of challenges.
- Data comes from multiple unstructured repositories (file servers, document management systems, intranet sites, Internet sites, database notes fields, etc.).
- Data in unstructured documents is of widely varying quality, much more so than structured data.
- The use of different types of unstructured data tools varies greatly from environment to environment and from problem to problem.
- In many cases, maximum value in analyzing unstructured data comes from analyzing in conjunction with structured data stored in data marts or data warehouses










Be the first to comment on this post using the section below.