Unstructured data is a given in the business world, but it poses a significant problem in the business intelligence world.

Merrill Lynch estimates that more than 85 percent of all business information exists as unstructured data in the form of emails, memos, notes from call centers and support operations, news, user groups, chats, reports, letters, surveys, white papers, marketing material, research, presentations, photos, video clips, Web pages and all kinds and forms of upcoming electronic data in databases (see Figure 1).

However, a more accurate term for many of these data types would be semi-structured data. Most of these files can de described with metadata, which is simply defined as data about data. Metadata provides information such as the name of the individual who created it, the time of creation and other such information, with which it can be stored in a relational database.

One needn't look far to find the source of all this new unstructured data. According to market research firm International Data Corp., in the case of the Web alone, more than 3 billion new Web pages have been created since 1995, with an additional 250 million new pages being added every month.

Roughly 15 percent of all structured data is captured in spreadsheets and databases. BI software, which enables companies to analyze that data in their databases, has facilitated many enterprises in their decision-making processes with structured data. Such is not the case for unstructured data. But if unstructured data accounts for the huge majority of all data in the enterprise, why has comparable BI software for it not yet achieved mainstream acceptance?

Raising Awareness

Across the board, managers in call centers, technical support and customer service departments note that while they are generating large volumes of data, they lack ways to analyze it. As a result, they miss important opportunities to identify trends and emerging issues and gain valuable insight into their customer base. As an executive at a Fortune 500 telecommunications provider said, "We have between 50,000 and 100,000 conversations with our customers daily, and I don't know what was discussed. I can see only the end point ─ for example, they changed their calling plan. I'm blind to the content of the conversations."

A major first step in addressing this issue is to raise awareness of unstructured data among data technology users, as well as the companies that design, manufacture and sell the technology. In a recent private study, we asked chief information officers and chief technical officers of 40 major corporations whether they saw opportunities to improve the handling of unstructured data within their organization. More than 60 percent recognized that unstructured data was a critical issue that, if addressed, could ultimately be used to improve operations or create new business opportunities.

The Need for Better Searches

With semi-structured documents, a comprehensive search function is a basic requirement. Prior to the emergence of the Web, text search techniques (which enable users to find particular words or phrases in the documents) were widely implemented by library, document management and other database management systems. However, with the growth of the Internet, the Web browser quickly became the standard tool to search for information. According to market research firm Outsell, Inc., office workers now spend an average of 9.5 hours each week searching, gathering and analyzing information. Nearly 60 percent of that time, or 5.7 hours a week, is spent on the Internet, at an average cost of $15,000  per worker per year.

Adding Context to Search

Web search engines generally treat each search request independently. This proves problematic because, while the results for a given search term will be identical for every user, the context may differ. For example, if a baseball fan and an amateur birdwatcher both type the words "blue jay" into a search engine, both searches will generate the same results, regardless of the fact that one is searching for a specific team’s batting averages while the other is seeking a recorded mating song.

Killer Applications for Content Intelligence

Content intelligence is moving beyond search and document classification and into full-fledged applications. Early applications were funded and used primarily by the intelligence community. However, commercial “killer” applications are now emerging to support nearly any type of industry that produces high volumes of semi-structured data. Here are two examples of search applications for content intelligence:

  1. Defect trend tracking: This service analyzes product defect information for heavy equipment. Manufacturers of expensive equipment such as aircraft and automobiles may be able to minimize warranty repairs by identifying trends within field-service records of the equipment. The automobile industry could use search to detect defect trends in tires or windshield wipers. A pharmaceutical company could use this application to track defects in the seals of packages.
  2. Web-based self-service: This solution allows users to access a website in order to quickly find answers to their product or service questions. The goal is to reduce the rising costs of after sale service and support by substituting it with Web-based searches and solutions. According to market studies by Forrester Research, the savings can be substantial: The total cost of a Web visit is $1 to $2, just a fraction of the $50 to $70 cost of the average customer service phone call.

To address this need of cost reduction, one software company is working with a number of customers, including a big investment house, to integrate natural language recognition into their search engine. By doing so, natural language phrases can be supplied for searches within Web self-service solutions. According to the company’s senior vice president of marketing, “Studies show that 50 percent of website visitors rely on search to find what they need; thus, a complete solution requires that the search engine be capable of interpreting a user's true need and providing an action-oriented response."
 

The Future Is Now

Content intelligence is maturing into an enterprise technology that is as essential as relational databases. The technology is developed with functions to search, discover and classify. In most cases, however, enterprises will want to integrate this technology with their established enterprise systems to derive the most value from the embedded unstructured data. Many organizations have identified high-value, content intelligence-centric applications that can now be constructed using platforms from leading vendors.

The use of these technologies to reveal new trends and issues and develop solutions is key to the continued popularity and success of content intelligence in the business community. In this way, unstructured data will become a critical source of actionable, time-critical BI. The time is now to bring unstructured data into the BI world.

This is the latest installment of a series of articles by Shaku Atre. To read the other articles, click on the titles that follow: "Who in the World Uses Only Words and Numbers in Reports?"; "Who in the World Wants to Stay Locked Up?"; "Who in the World Doesn't Want to Reach for the Clouds?"; "Who in the World Wouldn’t Want a Collaborative BI Architecture?"; "Who in the World Wants More Data?"; "Who in the World Needs a Data Warehouse?"; "Who in the World Wouldn't Want to Evaluate BI Products?"; and "Who in the World Needs a Hard Drive?"

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access