DEC 1, 2007 1:01pm ET

Related Links

Gartner Lists 10 Disruptive Technologies for Business Information Management
May 16, 2012
SAP Visualizes Next Steps with Analytics, HANA
May 16, 2012
Assessing the Full Potential of Connected IT Infrastructure
May 14, 2012

Web Seminars

Data Discovery for Big Insights
May 17, 2012
The Big Deal About Big Data Governance
May 22, 2012
Treating Big Data Performance Woes with the Data Replication Cure
May 23, 2012

Just Enough, Just in Time

Print
Reprints
Email

Since the dark ages, when the local pub was the best place for collecting information to secure a business advantage, business intelligence has always existed. From guilds and mercantile buyers and sellers all the way to today, gathering business information and organizing it into databases has correlated to better news collection and distribution (and now an information glut). We are currently in a golden age of good to excellent databases of business information and news services catering to the majority needs for business information.

This golden age of business-oriented databases and news services has created the need for two new capabilities to enhance business intelligence. First, the overwhelming flood of information, even summarized and collated, has created the need for data mining of business information to extract the nuggets needed for a particular business user. Second, the databases we use are not customized to particular needs, and multiple databases of information are often required for a comprehensive view of a market need.

The focus of this article is efficient aggregation and analysis of information from multiple databases into customized deliverables. To avoid staff growth in line with the number of customized deliverables, we need to rely upon the latest in information aggregation, workflow technologies, and text/data mining to make customized information delivery an efficient, scalable, and manageable process.

As an analogy, we could look at the raw text of news and the scientific literature as raw ore to process and the various structured databases of Pipeline and Deal information as recyclable metals. We need a process to take the ore extraction and combine it with recycled metals into a single product serving the needs of our customers.

Our example of a business intelligence need is in competitive intelligence for bioPharma. We have several good pipeline databases (PharmaProjects, TrialTrove, IDDB) and news services (Factiva, NewsDesk, Google News) supplying information to the bioPharma space but none of the pipeline databases are comprehensive and the news services are hard to tie directly into the pipeline database information. We will look at approaches to aggregate, filter and deliver a customized view of information for specific bioPharma businesses needs using the latest Web-based workflow technologies. The approach used will be easy to customize for a variety of departments and new purposes. Of course, we will obscure the specifics of what we are doing due to proprietary information concerns, but the examples will be easy enough to review and apply to the reader’s needs.

Tools

In order to collect, integrate, analyze and deliver customized streams of information to our business development customers, we employ a variety of tools that need to be meshed together. These tools need to be maintainable and re-deployable for new custom requests. The goal of our approach was not to develop any more internal software than necessary. A bioPharma company needs to focus on producing drugs, not large software projects. We prefer our internal focus to be on integration with a minimum of development work to link best-of-breed tools available from the software services sector.

Collect:

The techniques for collecting information from a multitude of heterogeneous sources range from simple SQL queries to the use of involved Web extraction protocols. Search engines and databases are queried to collect data and unstructured text. For continuous information streams, we use RSS/Atom as our information stream protocol of choice. In cases where RSS does not exist for an information stream, we either use Web extraction technologies (screen scraping) or an application to parse email into individual items of news or alert information. These extracted items are then incorporated into a very simple news database with RSS capabilities (built from Drupal, http://www.drupal.org/). Information and database providers license their content in a variety of ways and under copyright law, so be sure to review your rights and licenses to the sources you will be using.

Analyze:

The precision of the information delivered can be improved through sophisticated filtering or text categorization approaches. We can also utilize text mining tools such as Linguamatics to extract specific relationships between entities such as companies, diseases, adverse events or drugs.

Deliver:

Newsgator’s Enterprise Server (http://www.newsgator.com/) serves as our alert management interface for our customers. Aggregated and filtered RSS feeds are delivered by the Newsgator application for distribution to our customers. The product also lets us collaboratively monitor a number of news and information streams for competitive intelligence or research purposes by sharing the RSS streams amongst team members and publishing important items to a group RSS feed.

Results from information extraction of information streams are delivered in tabular form for Web-enabled databases, an area that is overdue for improvement. While Excel is still the most-used database technology in any company, it is not our database of choice. Unfortunately, developing Web-enabled databases remains difficult and time-consuming. Technologies like DabbleDb (http://www.dabbledb.com/) show promise as lightweight online databases that are easy to use and reconfigure.

Integrate:

The InforSense Platform (http://www.inforsense.com/) application serves the role of glue to tie these various capabilities together. If Yahoo Pipes (http://pipes.yahoo.com/pipes) brought global attention to drag and drop programming for data workflows, InforSense provides that functionality and a great deal more— including text analytics and data mining. We can extract and collect data directly from databases, Web pages, documents, or RSS feeds through InforSense. Once collected, the data, is pre-processed into a clean, properly formatted dataset for analysis. The workflows (Figure 1) are easier to understand than programs using VisualBasic or Perl having similar functionality. The workflow itself can serve as documentation of the business rules for the application.

Filed under:

Advertisement

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.