In May 2005, a judge fined Morgan Stanley $1.5 billion for failing to properly preserve information related to an active litigation. A jury awarded $800 million in punitive damages when that same firm repeatedly failed to produce electronic data in a timely manner, and another jury awarded $29 million in the largest sex discrimination verdict in U.S. history after UBS Warburg could not produce copies of relevant information. According to most industry experts, these punitive fines are just the tip of the iceberg. As of December 1, 2006, new Federal Rules of Civil Procedure (FRCP) amendments have raised the bar by increasing the possibility of such fines for noncompliance. But according to a recent ComputerWorld survey, 32 percent of IT professionals believe their companies are not prepared to comply, and an even larger number are not sure how the new rules for legal discovery will impact them.1


Legal discovery, or e-discovery, as it’s often termed, is defined as the act or process of finding or learning something that was previously unknown. In this case, litigators are referring to evidence related to a case. When your IT group becomes involved in a legal discovery - and it really is a matter of when and not if - you will be cordially compelled to hand over any emails, files and other data requested within a limited time frame, often within 99 days. A recent survey of 840 companies by the ePolicy Institute and the American Management Association found that one out of every five organizations has received subpoenas for such information and, according to an online article from, more than 90 percent of new business records are created electronically.2


Now for the really bad news. Of that 90 percent, 90 percent is probably at risk for spoliation. That’s a nasty little legal term used by lawyers and courts to reference the withholding, hiding or destruction of evidence relevant to a legal proceeding. Some people go to jail for such violations while others simply lose their jobs. Either way, spoliation can and should be avoided at all cost. To do so requires an understanding of the new legal discovery amendments and, most importantly, the limitations that current technologies have in meeting those requirements. Prior to the advent of the new FRCP amendments, IT and legal professionals predominately relied upon enterprise search solutions to discover and manage files related to pending litigations. But traditional search technologies, which were invented to index Internet content, are no longer adequate to fully comply with the new rules, specifically rules 26, 34 and 37. This article will examine the limitations of current search-related technologies and recommend IT best practices for file content discovery and management using newer technologies and solutions.


Let’s start with Rule 26, which states that a premeeting must occur between the companies involved in the lawsuit - not just between the lawyers, but also between the IT departments. Each of the companies must represent where and how data is stored, and that technologies are in place to provide access to that information. Most enterprise search solutions are quite adequate at meeting this requirement.


Rule 34 requires that organizations deliver the content in the format the requestor defines. Typically, the default is the native format, because it often contains hidden metadata that is erased when files convert to formats such as PDF. While most enterprise search solutions support a wide variety of file types, these solutions do a poor job of finding and extracting file system metadata. Also, their underlying indexing technologies are often too slow, too expensive, or require too much storage overhead when the universe of data grows beyond a few million files. Remember, slow often means late, which leads to spoliation, which can result in hefty fines, job loss and in some cases even criminal charges being filed against the IT personnel responsible.


Lastly, Rule 37 codifies the standards around legal hold. When a lawsuit is ongoing, a company must stop destroying all information related to the case, regardless of systematic destruction policies. Many companies have automated systems to delete data after a specific period of time has elapsed, usually for compliance reasons such as HIPAA, SOX or SEC regulations. However, when that data is pertinent to pending litigation the automated destruction of that data must be put on litigation hold. Anther issue with traditional search solutions is that they do not offer file-level policy management, file tagging or the ability to facilitate the copying of the files for the opposing legal counsel’s use. Just remember that finding the data is just the first step; producing it and preserving it is also required for FRCP compliance.


E-discovery 2.0


The Sedona Conference recently published a second edition of the iconic Sedona Principles: Best Practices Recommendations & Principles for Addressing Electronic Document Production (June 2007). Sedona Principle 5 provides that “reasonable and good faith efforts” are required to accomplish preservation, and the landmark case of Cache La Poudre Feeds, LLC v. Land O’Lakes, Inc. blends the Sedona Principles with case law in order to resolve complex challenges to a litigation hold process.


The initial version of Sedona Principle 8 provided that the primary source of information should be “active data” which is “purposely stored in a manner that anticipates future use and permits efficient searching and retrieval.” The second edition retains the focus on “active data” but incorporates the “accessibility” concept to help define the types of information that require proof. These are critical principles for IT professionals to understand as they relate to your technology selection for legal discovery. For example, some technologies focus solely on backup data, or data that resides on tape. As seen above, Sedona Principle 8 sets its sights on “active data.” Think Tier 1 and 2 disk here. If your technology has limited or no visibility into these front-line tiers, your company could get fined.


It gets worse. Sedona Principle 12 was revised to gain a more nuanced view of the need for metadata. It now provides that the form of production should take into account “the need to produce reasonably accessible metadata that will enable the receiving party to have the same ability to access, search and display the information as the producing party where appropriate or necessary in light of the nature of the information and the needs of the case.” Essentially, that lawyer speak translates into “thou shalt ensure searchable access and preservation of file metadata.” This includes file system metadata and custom application metadata which are areas of weakness for traditional search solutions.


Now that we know what’s required by the legal beagles, it behooves us to understand how best to meet those requirements. The information sought by litigators comes in three flavors: structured (database), unstructured (files) and semistructured (emails). Unstructured information like Word, Excel, Powerpoint and PDF files (not Web information) now accounts for 80 percent of corporate data. Finding relevance in all that unstructured data is like finding needles in a haystack. The most prevalent of these for legal discovery is search - enterprise search, not Internet.


While Internet search and enterprise search share the same underlying technology their use cases are vastly different. Internet search has touched hundreds of millions of people worldwide and reshaped our expectations of how we locate information. Web search engines can return excellent results on single-word queries of a 15-terabyte universe, but fall short in an enterprise with 50+ terabytes (TB), thousands of users and dozens of locations. Why? One reason is that data stored in the enterprise universe lacks the ultra-hyperlinked nature of the Web, and most of the fancy footwork employed in that world simply doesn’t apply in the enterprise. This is especially evident when security, reliability, and performance issues complicate the problem - such as the need to secure privacy data.


Enterprise search technologies do an excellent job of overcoming the corporate limitations of Web search, and for many years have also served us well for e-discovery mandates. But with the advent of the new FRCP rules, they now fall short in a world that some experts are calling e-discovery 2.0, and therefore may actually fuel the risk of noncompliance. Think of search as a librarian that creates a massive dictionary so you can find a half-dozen words. Building this index means cracking open each and every document, examining the contents and placing every word into a repository for future retrieval. The advantage is that you can find indexed words quickly. The disadvantages are slow indexing speed, high storage overhead and a vast majority of those words have no actual business context. The words include commonly used words such as “the,” “are,” “a,” “and,” “of” and “those;” all of which are contained in the previous sentence. Using that sentence as an example, 7 of 21 words are not relevant to the sentence or its overall relevancy.


Remember, Sedona 8 now shines a spotlight on active top-tier data. Many enterprise search engines suffer from limited performance scalability and can take up to two weeks to index just 10TBs of this data. If we have 50+ TBs, we will either miss our spoliation deadline or spend more on solutions than fines. Then there’s the overhead. Most search solutions require a 50 to 300 percent index overhead. For 50TBs, that equates to 25 to 150TBs of incremental storage just to house the index. Now let’s consider our changing universe of data. If users are altering 20 percent of their files daily, our search solution must be capable of indexing 10TBs per day just to keep up. To do this will likely translate into a price tag of greater magnitude more than our spoliation fines.


Even if we decide to foot that bill, we will still need to invest in other solutions that address the Rule 37 requirements for proper management, tracking and protection of litigation-related files. While some search solutions include features to facilitate legal-hold enforcement, they operate as islands that fail to consider other data management policies. Disparate solutions for compliance, data migration, archiving and security may conflict with solutions focused solely on litigation-held data.


Law and Order


To avoid spoliation and other legal discovery pains, we need new technologies and solutions that incorporate, into a single package, all of the capabilities required to meet not only the FRCP amendments, but the Sedona Principles as well. Such solutions should have the following:


  1. Compatibility. As all data resides on storage, it’s critical that we’re compatible with file system protocols such as CIFS (Windows) and NFS (UNIX, et al). Compatibility also means the ability to extract metadata from such file systems, given that this is an FRCP and Sedona mandate.

  2. Scalability. Taking two weeks to index 10 TBs is unacceptable. Spending millions to avoid fines is ill advised and may anger your CFO. Requiring index overheads of 50 to 300 percent is also not cost-efficient. A technology that offers a divisible database that scales across hundreds of terabytes and dozens of remote locations while creating limiting index overhead would be optimal. Such a solution would be optimized to extract searchable metadata from a targeted continent versus indexing every word in the universe.

  3. Capability. In concert with scaling and fitting in, our solution must also satisfy FRCP and Sedona mandates. We’ve collected inode metadata, now we also need to extract file metadata and store it in a searchable medium. The most efficient way to do this is to couple a metadata extraction engine with a regular expression generator and a fast database model. In this way, we not only grab surface metadata, but can also locate complex patterns inside files such as Social Security numbers, product part numbers, etc. Litigators are becoming more sophisticated and may ask for such information. If your search engine can only find words, you might get fined.

  4. Usability. All of the above can only invoke nirvana if it’s usable - and not just by the technical IT guru. The interface must support an easy way to allow legal and other users to discover what they have. This differs from search, which requires precognizant knowledge of what we’re looking for. A discovery engine and interface lets us find what’s there before it gets us into trouble. Litigators are clever. Their job is to locate and exploit your blemishes. If you allow an open field day on your data using search engines, you may expose your firm to undue risks. A better approach is to employ a solution that displays extracted file metadata based on key attributes-such as file types, file name words or key words within a document. Using Boolean logic, one can then discover what’s there and only deliver what has been requested by the lawyers.

  5. Protectability. This is required under Rule 37. One must be able to place policy-based legal holds on files involved in the litigation. Some solutions require copying such files to a separate repository which exposes the risk of duplication and increased storage overhead. These solutions also do not address the “island” problem wherein data is deleted or moved by another solution for compliance, etc. A best practices methodology is to utilize a solution that can create file tags. Such tags give us the power to label files-perhaps with multiple tags. A file involved in multiple law suits could then be labeled as such to ensure preservation until both cases are closed.

As seen above, litigation now impacts the IT professional in ways never before imagined. Where in the past IT might have enjoyed a layer of shielding from the law’s claws, such is no longer the case. New FRCP rules and Sedona Principles require IT to stand front and center with proof of technical and practical compliance. Failure to do so can result in stiff fines, imprisonment, embarrassment and unemployment. Yesteryear’s technologies, such as enterprise search, simply do not have the stamina to go the distance when called upon for legal discovery. IT professionals must now look for and deploy solutions designed to satisfy all legal discovery requirements before the litigation battle begins.




  1. Sharon Fischer. "Companies are Not Prepared for New E-Discovery Rules." ComputerWorld, November 21, 2006.
  2. Osterman Research. "Messaging, Web and IM Security Market Trends, 2007-2010." Osterman Research, May 2007.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access