Mergers in the business intelligence (BI) market have been big news lately. Hyperion acquired Brio. Informatica picked up Striva. Business Objects bought Crystal Decisions. Ascential bought Mercator. The market is maturing and consolidating. Vendors are improving features such as reporting tools and real-time extract, transform and load (ETL). Do any of these mergers really change the BI landscape? No. Query tools are still too complex for many users. Real-time data warehousing is still difficult. These mergers are not the fundamental shifts that change a discipline such as BI. The real news in the analytics market is about a merger that has nothing to do with what we see in the headlines.
The most interesting work in business intelligence today surrounds the merger of two technologies structured data analysis and unstructured data analysis. Megaputer was one of the earliest vendors to integrate text analysis with data mining, but others are joining in the trend. BI vendors, especially the statistical software companies such as SPSS and SAS, are integrating data and text mining functions to expand the scope of BI. Expect to see more.
The basic idea is straightforward: analyze unstructured text, identify important terms and concepts, and map that information into a more structured format that is suitable for data mining and statistical analysis. We are seeing this merging of technologies now for three reasons.
First, we are running into the law of diminishing returns. Conventional BI tools can squeeze only so much information out of a data set. Consider the staples of structured analysis: ad hoc query tools, online analytical processing (OLAP) technologies and data mining applications. They work well with the attributes that fit neatly into the rows and columns of databases. There are only so many ways to look at the same set of data before you stop finding new insights. The problem is not that we lack good analytic techniques, but that we need more data to work with.
The second driver is the millions of pieces of unstructured data that organizations have gathered which wait untapped. These comments and notes reflect information that does not fit into the coded attributes of transaction processing systems. Customer service representatives document detailed reasons why a customer is changing mobile phone services. Mechanics describe recurring problems with machinery covered under warranty. Claims adjusters note patterns indicative of fraud. All of this unstructured text contains valuable information if only it could be put to use. Information that could be culled from those comments and notes has been effectively off-limits to conventional BI.
Third, text analysis tools are mature enough to effectively analyze unstructured data in a BI context. In some cases, text analysis can be as simple as searching for the occurrence of particular words or phrases. For those problems we have fast, bit-parallel algorithms that match simple patterns and regular expressions. Some tolerate errors in matches, a feature that is essential when dealing with comments written in a hurry such as those in call center databases.
When simple pattern matching is not enough, we have tools such as part-of- speech taggers, noun-phrase extractors and lexical databases that help identify entities and, in simple cases, their relationships. Company names, locations, dates and currency amounts are commonly extracted entities. Clear Forest's Clear Tags and InXight's Thing Finder fall into this tool category.
We are in the early stages of the merging of this technology, and there are definite limits. Text analysis tools, especially those with strong linguistic analysis capabilities, do not scale well. Pattern-matching techniques can handle the millions of rows of data found in customer relationship management (CRM) systems, but more complex text analysis is best limited to smaller data sets.
Some tools require specialized knowledge for customization. Unless you have a linguist on staff, beware of what you attempt. Vendors are conscious of this problem and have already made strides to improve the situation.
Combining structured and unstructured analysis is paying off. Some companies are realizing 10 percent lifts over models based on structured analysis alone. Others are adapting call center scripts to address potential problem areas identified by analyzing CRM notes in real time. It is still too early predict the overall impact of combined structured/unstructured analysis, but early indicators are favorable.
BI is fundamentally changing. Unstructured data is now targeted for analysis. We have the tools to extract patterns and entities from text and make them accessible to conventional BI techniques. Mergers will continue to make news in the BI market, but few will be as important as this one.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access