One factor contributing to information overload has been the precipitous decrease in the cost of digital storage. The price per gigabyte of hard disk storage has dropped an average rate of more than 50 percent per year for the last eight years. Correspondingly, as illustrated in Figure 1, digital hard disk storage has grown at a rate of more than 85 percent per year over the last eight years.

Figure 1: Cost of Storage Decreases while Capacity Increases
As it becomes increasingly economical to collect and store vast amounts of information, organizations are capturing increasing volumes of unstructured information in the form of e-mail, customer feedback and the free text records of CRM, PLM, SCM and other enterprise systems. Many analysts and reports contend that as much as 80 percent of today's corporate data is now unstructured.
At the same time, dramatic changes are afoot in the legislative and regulatory landscapes and the corporate governance policies of many organizations. Legislators, regulators and even directors of many corporations are imposing new demands requiring organizations to start identifying and reporting earlier and in more detail on all manner of new developments, threats and opportunities.
- TREAD Act: The Transportation Recall Enhancement, Accountability and Documentation Act mandates that automakers report potential recall or customer safety issues to the National Highway Traffic Safety Administration (NHTSA).
- Sarbanes-Oxley: Sarbanes-Oxley provides for public company accounting reform and investor protection.
- The USA PATRIOT Act: Designed to promote increased security by providing tools to intercept and obstruct terrorism, broadens the rights of the federal government to collect and act upon intelligence information.
- Homeland Security Act: Established the Department of Homeland Security and outlines intelligence gathering and terrorism prevention responsibilities and powers.
In many cases, the facts that offer the richest and earliest indicators of threats and opportunities, whether with respect to consumer products or national security, are concentrated in unstructured information sources.
Consequently, many organizations are now facing an information conundrum, thus far unanswered, of storing unstructured information while obtaining maximum business benefit and avoiding legal or regulatory liability. In this environment, is it still a safe or viable strategy to confine enterprise information analysis to the numbers and codes of structured databases? With legal, regulatory and governance systems largely silent on that question, the answer may very well rest with state-of-the-art and technological feasibility.
What if organizations could extend their BI initiatives beyond structured data and into unstructured and mixed or "hybrid" data to discover the next threat to customer safety or national security or the next high-growth market opportunity? Who wants to be in the position of having to defend a failure to take that step on the day after the threat materializes or the opportunity is preempted by a competitor?
Can Unstructured Data Integrate into BI's Relational Reality?
The BI world has grown up on a relational foundation. This statement is not merely a fatalistic bow to the ubiquity of the relational database in the enterprise IT landscape. It is an acknowledgement of the fundamental nature of BI itself.
When a leading analyst group coined the term "business intelligence," it was defined as "a set of concepts and methodologies to improve decision making in business through the use of facts and fact-based systems." Obviously, business problem solving and decision making are best when they are based on the facts. And, in modern organizations, BI is the technology framework for deriving and delivering those facts.
Perhaps less obvious at first blush, however, is that this "fact-based" orientation necessarily involves a relational orientation. As it turns out, "facts" are inherently relational - that is, they are inherently amenable to being expressed entirely in terms of relation types and values. Every fact, whether it reflects the possession of certain characteristics by some person or thing or the occurrence of some event, can be reduced to and expressed in a tabular, relational structure.
For example, the fact that John Doe has customer ID number 987-65-4321, phone number 201-555-1234 and an address of 100 Maple Street, Anytown, MA 01234, can be expressed as shown in Figure 2:

Figure 2
Similarly, the fact that, as the mechanic discovered in Figure 3, the bolt on the underside of his transmission case cracked because of heat, also can be expressed in a tabular representation:

Figure 3: Tabular Expression
In short, relational orientation of BI is not just a pragmatic reality (i.e., this is the way BI is because it has been this way for 20 years) - it is a fundamental reality (i.e., this is the way BI is because it must be this way). Because the "facts" on which BI systems depend in order to facilitate fact-based decision making are inherently relational in nature, BI itself is inherently relational in its orientation.
Two key questions remain. Are there "facts" of consequence to business decision-makers that are captured in unstructured data? And if so, is it possible to get those facts out of text and into relational structures for BI tools to use them?
Traditionally, efforts to bring unstructured data into BI environments have come down to using statistical natural language processing techniques to consolidate complex "relational facts" into one-dimensional codes, or associating pieces of text with conceptual or topical meta data tags. Unfortunately, these approaches tend to be one-dimensional, expensive and inaccurate. In short, they yield unsatisfactory results.









