REVIEWER: Dean Kimball, CTO for mFusion Technologies.

BACKGROUND: mFusion Technologies supplies targeted business intelligence to mobile professionals on their wireless devices such as cell phones and PDAs.

PLATFORMS: Windows 2000.

PROBLEM SOLVED: mFusion Technologies needed to aggregate content from multiple data sources ­ internal and external, structured and unstructured ­ to provide real-time business intelligence to mobile professionals using wireless devices. Although some data was available from structured sources, much of this content was maintained on a wide variety of Web sites where it had to be regularly identified, extracted and reformatted for delivery to wireless devices. WebQL from Caesius Software was used to automatically find and extract Web-based data using query formats similar to those used for structured data sources.

PRODUCT FUNCTIONALITY: WebQL's SQL-like syntax allowed mFusion to crawl the Web and locate relevant data using powerful regular expression capabilities. It was interfaced with mFusion's query descriptor object providing a common method to describe the information requirements to all the underlying fetch and extraction technologies. The information retrieval capabilities of search engines did not provide the pinpointing and extraction mFusion required for this application. Furthermore, WebQL's powerful post-processing capabilities make it easy to clean and format intermediate data for consistency with internal data stores. WebQL's plug-in API is an interesting feature for future development. It allows custom code to be "plugged in" to the WebQL framework for tasks such as working with non-HTML data types.

STRENGTHS: The use of SQL-like syntax and regular expressions is a very unique and powerful way to locate and acquire unstructured Web data. It allows for crawling and discovering without having to know, in advance, where the exact information is located. Prior to discovering WebQL, mFusion was planning to write custom code to accomplish this task. WebQL provided a cost-effective and highly functional alternative.

WEAKNESSES: Web sites that require cookies to maintain persistent connections or sites that contain large amounts of Java script can be difficult to work with for data extraction regardless of the products and technologies employed. Features to assist in these cases would help to reduce the time needed to handle these more complex sites. Also, options to directly extract data into industry-standard data streams and formats would be a nice enhancement rather than requiring the conversion from text files. I understand both of these issues are being addressed in future versions of the product.

SELECTION CRITERIA: WebQL was the only product we found that could perform the task at hand. Other Web harvesters required physical identification of the information to be harvested before extraction could be automated for subsequent runs. They had no ability to discover and extract information based on matching selected criteria. This was not acceptable as the information mFusion required could appear at different times on a variety of sites. Using SQL syntax and regular expressions, WebQL is able to discover and logically pinpoint the information needed. No previous physical identification of information is required.

DELIVERABLES: WebQL provides simple access and formatting of business information from Web sites suitable for data warehousing or real-time data delivery. Comprehensive business intelligence requires data from a wide variety of data stores. mFusion used WebQL to pinpoint and extract unstructured HTML data from business-oriented Web sites. This information was aggregated into a data warehouse and then delivered to wireless devices used by mobile professionals based on classification and characterization profiles.

VENDOR SUPPORT: All requests to Caesius support were resolved on a timely basis. Considering the fact that mFusion was an early customer of WebQL, the product worked very well from the start and required only minimal support.

DOCUMENTATION: Because mFusion was using WebQL as a development tool, the product was in the hands of software developers who were able to figure it out quickly. For these individuals, the documentation was more than adequate. Inexperienced end users should not have a problem with the documentation as long as they already have a basic understanding of SQL and regular expressions. Having an excellent online copy of the documentation available on the WebQL site is a big help.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access