As business intelligence (BI) evolves from recounting the past to forecasting the future, unstructured information and enterprise search capabilities move to center stage. The statement that business intelligence today "predicts" the past is an often-used joke because classic BI forecasting consists of taking a few historic data points, drawing a straight line through them and extending it out into the future. However, to deliver true business value, business intelligence processes and technology must do more. The requirement to leverage business intelligence to surface new trends, emerging customer preferences, product availability, financial dynamics and staff development is critical both to operational as well as enterprise decision-making.
Much of business intelligence today remains reactive, even when tools provide real-time access to information. For example, if an enterprise has real-time data indicating it is out of stock on a hot item, all that can be done is to order and wait. Depending on the length of the supply chain and whether the truck gets stuck in traffic - an event that is also in real time - real time does not guard against lost sales. Sometimes even real time is too late.
The emerging role of business intelligence systems is to alert decision-makers proactively about critical situations. This requires a number of search capabilities that are not usually associated with inferences and predictions based on the data in standard relational databases, much less standard query and reporting tools that form the bulk of business intelligence applications. The key to detecting future business opportunities lies in the ability to organize and search vast quantities of both structured and unstructured information, such as call center notes, repair orders, contracts, images and audio and video files, buried within siloed business systems.
To enable this, metadata will play a strategic role. Progress in leveraging search for business insight is finally occurring as metadata is harnessed to better leverage unstructured information for purposes of customer service, supply chain logistics, product quality insight and other related business imperatives. Unstructured content and undifferentiated data can be structured and made useable through metadata.
Internet-based search has become common to almost everyone. However, few have insight into the relevance for business intelligence that enterprise search holds. Specialty search applications originally designed to solve industry-specific challenges are now moving into the business intelligence arena.
This new breed of search technology combined with the power of business intelligence opens a whole new realm of possibilities, such as new software that enables the analysis of call center transcripts to determine the ability of debtors to pay what is owed a lender or collection agency. Or it can be used for detecting otherwise unrecognized drug interactions by analyzing the linkages in medical abstracts to help prevent disaster as well as help discover new drugs or cures. Another example includes rapidly detecting hidden trends in problem reports coming in from around the globe to avoid recalls and save consumer product companies and their customers millions if not billions.
While the volume of information continues to expand exponentially, one of the secrets to successful search will be in preprocessing of the material to generate enhanced, usable metadata that can be used to produce business insights. Finding a needle in a haystack is easy in comparison to locating a particular data point in the ocean of unstructured, semi-structured or unconventionally organized content on the Web or on public collections in libraries, private enterprises or national laboratories. "Structure" takes on new meaning when extended beyond the scope of the standard relational database, which, by the way, is expected to remain the dominant design for managing transactional data in commercial business applications. "Structure" now means "metadata."
While standard relational databases are filled with well-defined data elements of codes and numbers, most of the content in the world is represented in natural languages such as English. Of course, such a system (i.e., the English language) is itself a code, but one that is immensely more complex, ambiguous and, therefore, flexible and readily adaptable to a vast variety of business contexts and forms of social intercourse.
This leads to the debate about what computers can't do - in short, computers are not very good at understanding context. However, machines are getting better at handling language in its various forms. In many ways, XML is the cavalry to the rescue of metadata for unstructured collections of stuff. But XML is not the only arrow in the quiver and cannot address the problem by itself. Progress continues to be driven by advances in statistical and rule-based natural language processing (NLP), ontology (data modeling plus reasoning), information retrieval, machine learning, automated reasoning and knowledge sources (including lexicons and frameworks for handling meaning).
In its simplest forms, the preprocessing required to enable search consists in a series of steps such as tokenization, parsing, clustering around key concepts, annotation, indexing and representing metadata in a usable form. Instead of data normalization as one might expect with a relational database to eliminate ambiguity and vagueness, we get a process that is equally rigorous but still converging on a set of standards called common analysis structure (CAS).1
Thus, semantic search goes beyond anything envisioned in the structured queries (a kind of search) performed by relational database engines. Semantic search is differentiated from mere structured search because the meaning of the query is built by using entity identification or relations as qualifying attributes. For example, a search on person "Bush" should not return bushes that grow in a garden, but rather persons named Bush. While a simple example, it shows how context must be captured and represented in the system for it to disambiguate the meaning of what is being searched. This pushes back the limits of what is possible, redefining what computers can do. But capturing context is still such a wicked problem that in the current state of the art (Q1 2006), each specific domain - CRM, pharmaceuticals, debt collection, supply chain - requires its own lexicon, ontology (data model plus inference engine) and software cartridges.
The success of search is set up and conditioned on the completeness, accuracy and usability of the metadata with which the collection of documents is annotated. Metadata is required to represent, create and preserve the context of the information exchange. Content without context is meaningless: it does not reduce uncertainty; it increases uncertainty. It is not information; it is negative information (entropy). In data mining and data warehousing, data preparation is often sized accurately at 60 to 90 percent of the effort - and that is with structured data. Why should unstructured data require any less effort to be made accessible, intelligible and manageable from a business imperative point of view? Searching business intelligence from any unstructured or loosely structured environment such as the Web or from other electronically available documents such as email is a knowledge creation and management process in the proper sense of the term.
Don't throw away your standard query and reporting interface just yet. However, over the next two to five years - maybe sooner as breakthroughs in semantics and NLP occur - watch as expanding search capabilities and implementation add requirements and results in tracking emerging trends to business intelligence tools and processes that raise the bar on technology and business design.
1. Such as Unstructured Information Management Architecture (UIMA), discussion of which would require a separate article. For further details see http://www.alphaworks.ibm.com/tech/uima.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access