The IT industry suddenly seems to be inundated with 2.0 technologies. It all started in 2004 with a Web 2.0 conference organized by O'Reilly Media and CMP. Since then, vendors, analysts and conference organizers have jumped on the 2.0 bandwagon in almost every area of IT from enterprise systems to business intelligence (BI).

Although the definitions and components of these 2.0 technologies are somewhat vague, the concepts behind them are nevertheless useful because they identify a set of disruptive technologies and approaches that offer new ways of building, deploying and using IT applications - the focus being on simplicity and user self-sufficiency.

The objective of this article is to discuss the impact of Web 2.0 on the IT organization, looking specifically at BI and data warehousing.

What is Web 2.0?

Web 2.0 offers three main benefits: the do-it-yourself (DIY) Web, a richer Web user experience and lighter-weight Web development models. Technologies that support each of these areas include:

  • The DIY Web. Wikis for information publishing, blogging for expertise sharing, Web syndication for information delivery, tagging and folksonomies for information categorization and enterprise search for information.
  • A richer Web user experience. Rich Internet applications (RIA), AJAX and Adobe Flex.
  • Lighter-weight Web development models. Scripting languages such as PHP and Ruby, mashups and the REST Web services protocol.

Lighter-weight development models and a richer Web user experience will increase the power and use of Web interfaces for all types of IT applications, including BI. The DIY Web, however, is likely to have the biggest effect on BI because it provides powerful but easy-to-use tools for publishing, sharing, finding and collaborating about business information, including BI.

A Web 2.0 Example

Wikipedia is a good example of how the DIY Web can dramatically change the way information is created, published and maintained. "Wikipedia revolutionized how we think about knowledge," notes Alan Deutschman in an article in Fast Company magazine.1 A study in the same article documents that in December 2006, Wikipedia had 164,675,000 unique visitors, while in October of 2006 more than 75,000 Wikipedians edited five or more articles. These numbers not only demonstrate the popularity of Wikipedia, but also the large audience contributing to its content.

It is interesting to compare how Wikipedia maintains information with the more traditional approach used by Britannica Online. "The Britannica maintains an editorial staff of five senior editors and nine associate editors. It has an editorial board of advisers, which currently includes 14 distinguished scholars," notes the Wikipedia entry about Encyclopedia Britannica. The 9,500 word entry goes on to say, "The 2007 print version of the Britannica boasts 4,411 contributors. Most (98 percent) contribute only a single article." The Britannica Online entry about Wikipedia, on the other hand, is very short. It contains some 737 words, of which the first 75 are free, and the rest have to be purchased. Clearly, the approaches used to manage and deliver information by the two organizations are quite different.

Wikipedia changes the way people publish and share information - it uses a bottom-up community, or social networking approach, rather than the more traditional top-down editorial approach used by Britannica. Community-based publishing increases both the richness and timeliness of information. This is why Wikipedia receives roughly 450 times more traffic than does the online version of Britannica.

Other examples of how the social networking model is improving the availability of information are for news articles, for information bookmarking, and and for blogging. Although all of these examples pertain to the public Internet, the concepts and technologies behind them are equally applicable in the enterprise, and there are now both open source and commercial products that support the use of these technologies in enterprise systems.

Social networking changes not only the way information is created, but also how it is organized and accessed. Editorial-driven publishing on the Internet and in corporations organizes information around a taxonomy, which is typically defined ahead of time by editors and content administrators. Community-based publishing, on the other hand, relies on the author of the information to tag it with one or more keywords that describe its contents. The folksonomy (pool of tag names) grows as more content is published.

The issue in enterprises is that while community-based publishing and folksonomies encourage people to contribute expertise and knowledge to the pool of information available in the company, the accuracy and quality of the information may suffer. There are also privacy and security concerns as well. Applying rigid governance procedures to the DIY Web, however, is counterproductive. One solution is instead to have two types of information in an enterprise: managed and unmanaged. The managed information is produced and maintained by traditional methods. The unmanaged information created using Web 2.0 approaches is checked for security and privacy violations, but the quality and accuracy of its content is not guaranteed.

What Does This Have to Do With BI?

The objectives of a BI environment are to monitor and analyze business operations to give business users the information they need to make more effective business decisions and to take action to satisfy business requirements and avoid business problems.

Nearly all the information processed by BI applications comes from internal files and databases that contain structured business data. There is increasing interest in capturing and analyzing unstructured business content from both internal and external systems. Given that the vast majority (about 80 percent) of internal information is in an unstructured form, this represents a huge untapped source of valuable information. Add in Internet content and potential Web 2.0 enterprise content and the possibilities are huge. Examples of applications here include market intelligence, pricing optimization, customer complaint analysis, regulatory compliance, legal discovery and intellectual property protection.

As organizations begin to process business content and implement Web 2.0 technologies, there are three potential new sources of information for use in BI:

  1. Internal managed business content,
  2. Internal unmanaged business content, and
  3. External unmanaged business content.

Extraction Methods

There are several ways of extracting and accessing BI and knowledge from this business content. These are illustrated in Figure 1.

Figure 1: Processing Business Content in a BI Environment

  • Data integration applications that capture, transform and integrate business content into a data warehouse. These applications typically transform the content into structured data or into a semistructured format such as XML. Business content exploration and mining applications are often used to extract metadata from the unstructured content for use by data integration applications.
  • Data access applications that use federated queries to access business content. Metadata for these queries may be produced by business content exploration and mining applications.
  • Business portal and user workspaces plus enterprise search for locating and accessing business content. Compared with Internet search approaches, enterprise search adds techniques such as guided navigation, semantic search and result clustering. Some vendors provide BI portals and BI search to give users access to BI content such as metadata, queries and analyses, and BI output like reports and metrics. BI search can also supply connectors to a BI system for use by enterprise search.
  • Business content exploration and mining applications that extract metadata (facts, concepts, relationships) from business content. This extracted metadata may be used by business content categorization and taxonomy tools, enterprise search, content analytical applications, and data integration and data access applications.
  • Business content analytical applications generate analytics about business content by accessing and analyzing the contents of a search repository or the results of search and exploration operations. Often these analytics are delivered to business users through BI dashboards. This approach enables content search and exploration vendors to compete with the traditional BI vendors for a piece of what could become a sizable marketplace.
  • Web content syndication tools that notify business users and applications via RSS and Atom Web feeds that new business content is available for viewing and processing. These feeds are XML files that can be processed by a business portal or other applications. Web feeds have a variety of uses in the enterprise. Examples include distribution of BI and other reports, publishing content to customers and other target audiences such as the media, mass communication to employees, filtering industry news and the republishing of licensed syndicated content.

Unstructured business content represents a huge opportunity for BI. As Web 2.0 technologies find their way into the enterprise, the volume of this content will increase dramatically. IT organizations and the BI group need to understand the business value of this content and extend information architecture to support it. The use of business content in BI will require the BI group to work closely with those responsible for Web technologies, content management and collaboration. BI architects and developers also need to be aware of non-BI vendors that support the processing of business content.

  1. Alan Deutschman. "Why is This Man Smiling." Fast Company, April 2007.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access