The availability of robust and scalable portal and search tools makes it possible to implement enterprise search services. The general idea behind enterprise search is that content anywhere in the organization is accessible through a single search tool, usually as part of an enterprise portal. While a far cry from the implementation quagmires of ERP and CRM systems, enterprise search applications can fail if you do not plan properly.

To ensure a successful implementation of enterprise search, be sure to address: sizing, duplication, design, control and metrics.

Precise figures are not needed when sizing your existing content, but you need to know orders of magnitude to plan for storage, network utilization and processing times. It is also important to know where the content resides. Crawling, the process of gathering content for indexing, must be scheduled so that file, Web and database servers are not overwhelmed with I/O operations. Sizing early in the implementation phases of enterprise search will also provide a baseline for monitoring growth, as estimates on information growth vary widely.

With a handle on the extent of enterprise content, move to the second step of minimizing duplication. Having multiple, identical documents in an index will waste space. Variations in versions will raise questions of authenticity; for example, which 401(k) plan document is accurate, the one from finance or human resources? Some duplication will always exist. E-mail attachments sent to multiple recipients are duplicated. Copies are made when access controls might prevent users from retrieving documents in a directory even if the user has legitimate reason to read some of them. The goal is not to resort to draconian measures to prevent duplication, but to recognize the problem and clean up duplicates when possible. It is difficult to determine how much duplicate content you can expect to eliminate. Estimating duplicate content is also problematic. The "How Much Information?" study conducted at the University of California, Berkley, indicates that original content constitutes roughly 20 percent of all digital content. That's probably the best general estimate we will find.

Thirdly, design the enterprise search architecture to meet your infrastructure requirements. The design must account for additional storage requirements and increased network traffic, particularly during crawling and load balancing for indexing and query response. Additional considerations include replicating indexes to improve performance, backup and recovery procedures and network security. In many ways, the compilation of the index for enterprise search is similar to the extract, transform and load process in data warehousing. The processes often occur during off hours creating limited windows of opportunity, the initial build is much more time- consuming than incremental changes and the processes collecting information will need access to multiple systems. Small installations can operate effectively with a single server for indexing and query processing. Midsized sites should consider using separate indexing and query processing servers. Large enterprises will require a brokered or federated architecture. Brokered systems use multiple indexing and query processing servers with a single broker process distributing the work between them. When enterprise search is mission critical, a failover broker should be in place. Federated architectures also distribute the workload; however, they use different search engines to search different repositories, such as Lotus Notes, Open Text Livelink and Web search engines. The results are then combined and presented to the user. One of the advantages of federated searches is that a centralized logical index is not required. On the other hand, some of the more advanced functions related to personalization and vendor- specific features depend upon information maintained in centralized indexes.

Controlling search operations is the fourth element of successful implementations. Controls begin with policies. At the very least, policies should be defined that identify the type of content to index, the frequency of updates and access controls on the content. Some content, such as enterprise portal content, should clearly be included in enterprise search; other content, such as confidential legal and human relations documents, should not. However, the inclusion of remaining content is not determined quite as easily. Should user directories be included even if the user is the only one with access to the directory? Should you restrict indexing? The answers to these questions must balance indexing and query-processing resources with the ability to improve the way users work.

Finally, establish metrics to measure enterprise search performance. Key indicators available from Web log analysis are the number of queries issued, average number of hits per query, number of queries per session and query response time. Ideally, a user should have to issue few queries per session, receive relatively few hits and receive responses quickly. More advanced analysis on terms used in queries can help guide category and taxonomy development.

Successful enterprise search will not just happen by installing some software and letting it run. However, attention to these five key elements will get you where you want to go.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access