The focus in enterprise content management (ECM) is shifting from ending information scarcity to dealing with information overload. This dynamic explains why the disparate technologies of search, records management and analytics are now hot.

In the past, different departments managed different media types; they used separate systems and specialized equipment to perform these tasks. The underwriting department within an insurance company used an imaging/workflow system to manage the underwriting process; the regulatory affairs department within a pharmaceutical company made sure that a document management system controlled submissions to the FDA; marketing enlisted a Web content management system to manage the corporate Web site and so on (see Figure 1).

Figure 1: Early ECM Silos

However, these media-centric silos are starting to topple. Repositories are starting to manage multiple content types and enterprises are taking a more lifecycle-centric view of the problem.

This viewpoint shift is altering how both enterprises and vendors approach ECM. A stream approach is taking over, where multiple systems create and manage a content stream, a way of working enabled by standards such as XML, RSS, UIMA and JSR-170. ECM applications are increasingly being subsumed into the system infrastructure.

In this world, content management solutions perform seven main processes:

Creation: The creation technologies that were the first wave of content management applications.

  1. Storage: Applications that store and manage active, unstructured content.
  2. Distribution: These systems push content to users and other applications, thereby making content pervasive and available on a wide variety of devices.
  3. Discovery: These applications help workers find relevant content by letting them query repositories or drill down into a hierarchy.
  4. Archiving: These systems store inactive content in business-friendly ways so it can be retrieved later.
  5. Analytics: These solutions serve as a feedback loop for improving content creation, distribution and discovery.
  6. Management: Management is evolving from an application that manages a specific media type - e.g., documents, Web pages - to a service that supports the entire content lifecycle.

The first six make up major product categories. The last one, management, is a foundation the others rest on (see Figure 2).

Figure 2: ECM Process Framework

The second shift talks about the consequence of generating so much digital content over the past 30 years. In the 1950s and 1960s, secretaries typed only important documents - it was too expensive and time-consuming to type ephemera. However, with Microsoft Word on almost every desktop, employees now type up memos as a matter of course, documenting important subjects as well as trivia.

In short, we've moved from an environment of information scarcity to one of information overload. This has had a profound impact on what content management problems enterprises now need to solve. It is driving them to focus on the last three stages in the ECM process framework: discovery, archiving and analytics (see Figure 3).

Figure 3: Information Scarcity vs. Information Overload


The technologies that aid in information discovery - search and categorization - are undergoing a rapid transformation. Historically, Web search, enterprise search and desktop search have been stovepiped, facing different enablers and challenges, and sold by different vendors (see Figure 4).

Figure 4: Search Categories

Users are beginning to question why they need to switch applications to search the Web, their company site or their desktop - when all they want to do is to find the relevant information no matter where it resides. Accordingly, vendors such as Google, FAST and X1 are starting to offer universal search.

Search is no longer about looking for unstructured content. In June 2006, Google announced its new Google OneBox for Enterprise feature of the Google Search Appliance, which enables employees to search for information stored in operational systems, such as purchase orders within an ERP system. Meanwhile, other search companies such as Verity (now owned by Autonomy) and FAST Search have been integrating search with operational and BI applications for years.

Categorization is another technology that enables users to find what they need. Companies such as Autonomy, Endeca, InQuira and Recommind use automated categorization techniques to assign categories or topics to documents, thereby helping workers zero in on a set of related documents. It's also becoming easier for users to manually categorize, or tag, the documents. Popularized by sites such as Flickr and, social tagging lets workers do the categorization. Because the resulting folksonomies reflect how users view the content, the structures morph over time as workers' views of the business change. This is in contrast to official taxonomies that can become divorced from reality if editors do not continually update them.

Discovery technologies are evolving rapidly, and it behooves businesses to stay abreast of the latest capabilities. Otherwise, they are needlessly sentencing their workers to a lot of hard labor in searching.


Archiving, the act of making active content inactive, is another rapidly evolving area. Part of the reason for this is the explosion in the number of records that companies must decide to keep or discard. Thirty years ago it was easy - businesses could say, "Keep those 127 file cabinets." Today, all of those official memos and contracts have changed from paper to digital format and reside all over: on PCs, laptops, servers, USB keys and so on. Conversations never recorded years ago now turn up in emails, instant messages, blogs and wikis. This set of messages and documents is intermixed, making it difficult for companies to separate the official from the ephemeral. For example, a company may retain emails for two years, but contracts for seven. This means that if a contract is emailed, the archiving system must store the contract and the email separately so the contract does not vanish when the system destroys the email transaction.

Regulators are finally realizing that businesses conduct themselves electronically. Five years ago, if a corporation received a subpoena for some emails and the company replied that it would take six months and half a million dollars to retrieve them, the courts often softened the request. No more. Regulators now hand out multimillion-dollar fines if companies do not hand over the appropriate emails quickly.

Unfortunately, records management and e-discovery solutions are much more immature than their search and categorization counterparts. This is due to the complexity of the problem and technological history. Different countries have very different records management laws. Consequently, multinational corporations cannot implement one records management policy - they have to implement hundreds.

The technological systems are equally fragmented. Content management systems usually excel at archiving the content they control but poorly archive other content types. Therefore, while a document management system may archive documents, it typically does not archive emails. This is a legacy left over from the media-centric stovepipes of the last several decades, and it will take some time for records management systems to become policy-centric rather than media-centric.


The most immature of these three areas is one rarely talked about: content analytics. Nevertheless, it is an absolute requirement if companies are to get better at churning out relevant content at the lowest cost. Every company today is a small publishing company, generating pages of new content that go into manuals, email newsletters and Web sites. Large corporations often field Web sites with tens of thousands of pages. Given this level of labor, enterprises need to start thinking about how to optimize this process. How much time are they spending creating content, and are employees, partners and customers actually using it? Given the corporation's finite resources, should it focus on improving this set of Web pages or that set of Web pages? Companies have optimized their manufacturing production lines for years - now they need to do the same with their high-volume content production lines.

Most companies have done well understanding how customers, partners and employees consume published content. By enlisting a mix of technologies, such as clickstream analysis, search analytics, A/B and multivariate testing, and blog analytics, corporations have gotten a feel for what content users read and what they ignore. Unfortunately, they have only a rudimentary understanding of the creation side of the equation: how much it costs to write a manual or the latest set of Web pages. Therefore, companies need to start mapping content creation costs against content consumption benefits so they can finally get a handle on the value of their content (see Figure 5). Without the necessary feedback loop, businesses may be spending money to generate content that isn't useful or relevant. After all, manufacturing companies do not spend time and money to build products that customers will not buy - or if they do, they quickly stop after seeing the dismal sales figures. Companies need to control the supply and demand process for content in the same way.

Figure 5: The Content Analytics Equation


Enterprises developing solutions for discovery, archiving and analytics are finding that these areas are not easily conquered. Here are some lessons gleaned from early battles.

Although discovery is the easiest of the three to tackle, companies still need to be careful on two fronts. First, they need to have a clear understanding of their discovery needs, because search tasks are different - the search technologies needed for an e-commerce site are very different from those required to pull up scholarly research papers. Second, they must recognize that they will need to tune the search engine to deliver relevant results. While discovery technologies are getting better, the market has not yet reached the "drop it in and forget about it" stage.

In records management, companies that make the most progress will be those that tackle the organizational issues first. This includes tasks such as coming up with content categories and sitting down with the legal department to clarify what the company's records management policies are in all of its myriad divisions. If IT just throws up its hands and backs up everything, it can cost the company in various ways - in processing and storage costs, in breaking laws by backing up personal information illegally and in making it possible for records that should have been deleted in the course of business to be discovered by adversaries.

In content analytics, enterprises must figure out how to capture the content creation costs as well as how to integrate the numbers in an easy-to-use fashion. For example, although vendors talk about integrating information between Web content management systems and Web analytics systems, that usually means they generate some reports. They do not make the content contributor's life easy by displaying the popularity metric of the Web page within Adobe Dreamweaver, Microsoft Word or whatever tool the content contributor uses to create the page.

In short, dealing effectively with information overload is not going to be easy. But it is today's task. 

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access