As our dependence on email has grown, so has our need to find efficient tools to cut through all of the clutter and find what's really relevant. In the Enron case alone, over 12 million documents were seized by investigators and if those pages were printed out, the stack would be over three times taller than the Sears Tower.

Or consider the case of an inventor who recently brought a patent infringement suit against a large corporation. His lawyers had to read over 50 million pages of electronic documents for potential relevance in just a few months. Ten years ago, it would have taken an army of lawyers several years to examine all of the content and assess its relevance to the case. Today, this type of e-discovery project is much faster and cost-effective due to emerging visual analytics tools, which allow legal teams to quickly and thoroughly review sensitive corporate data for litigation, regulatory requests and internal investigations.

In the patent case, a small legal team was able to use visual analytics to convert the enormous set of data into graphics that were organized by the document's concepts. This helped them find the "smoking gun" piece of evidence - an Excel spreadsheet attached in an email - and secure a $500 million ruling in favor of the inventor.

In today's litigious and regulatory environment, companies are growing increasingly aware of the need to have processes and tools for finding and reviewing massive amounts of unstructured electronic documents quickly and thoroughly. The new Federal Rules of Civil Procedure (FRCP) that took effect last December are driving companies to examine how IT and legal departments can better work together to save money and reduce the risk of litigation. Those currently evaluating their e-discovery plans should seriously consider visual analytics because it is uniquely suited to handle these challenges.

Review: the Largest Addressable Cost in E-Discovery

Much of the discussion within the IT industry this year has focused on the collection of data, since in-house technologists are often asked to drop everything at a moment's notice to find terabytes (as many as 90 in some cases) of data for important legal matters. Despite the importance of collection, e-discovery early adopters in highly regulated industries such as financial services, pharma, energy and telecom point to review as the largest addressable cost in e-discovery.

Currently, corporations spend huge chunks of their legal budgets on attorneys billing $300 an hour to review thousands if not millions of electronic documents. In fact, a recent study estimated that U.S. corporations spend nearly $5 billion a year analyzing emails for litigation, regulatory requests and investigations. However, the productivity benefits included within visual analytics is creating a paradigm shift within e-discovery.

A Forrester Research report from December of 2006 echoes these findings, noting, "Tools with visual analytics built in can make these legal professionals more efficient by determining whether or not data is relevant, is privileged, or even needs to be produced in response to a discovery request."1

Visual Analytics: Taming Unstructured Content

Most business users already use some form of visual analytics, since it's grown increasingly common in a number of applications. Leading enterprise resource planning (ERP), customer relationship management (CRM) or business intelligence (BI) tools, for example, are likely to have pie charts and other graphical representations available for users to quickly grasp a high-level overview of the data and evaluate revenue trends. The availability of visual analytics handling unstructured content, however, has lagged behind ERP and CRM solutions because it has been easier to create visual representations from numbers than from unstructured content found in email.

Sophisticated visual analytics tools that can identify the nouns and noun phrases in a series of messages, then visually cluster the documents together according to similarities in subject matter, are revolutionizing the e-discovery review process. The software can contextually assess these nouns and noun phrases to distinguish words with different meanings, e.g., when the word "diamond" is referring to baseball, as opposed to an engagement ring.

Once the content is analyzed, the software can visually represent the key concepts included within the data by clustering documents around similar ideas. These tools can also provide a timeline and social networking view so that users can see which company employee emailed with whom and when. As analysts click on clusters and read the documents, they can quickly tag them as privileged or responsive to the case.

In minutes, visual analytics enable identification of correspondence, evaluation of its substance, an understanding of the parties involved, and metrics on the volume of their interaction. This ability makes visual analytics ideally suited for e-discovery because it addresses two of the main challenges for corporations involved in litigation today - speed and risk.

Speed: Early Case Assessments and Increased Productivity

Periodically, business cases grab headlines when embarrassing emails are entered into evidence. Consider a Massachusetts class-action suit brought a few years ago over the dangers of the diet drug Phen-Fen. During discovery, the court permitted admission of an e-mail from a defending company executive in which he wrote, "Do I have to look forward to spending my waning years writing checks to fat people worried about a silly lung problem?" In addition to damaging a company's brand outside of the courtroom, these email revelations can lead to runaway verdicts and stiff penalties.

For these reasons, many organizations utilize visual analytics to implement pre-trial assessments (often called early case assessments) to quickly review the massive amount of information quickly and determine the strength of the case and the best course of action for the company. For example, if the legal team for Phen-Fen had seen the above email during an early case assessment, they may have chosen to settle the matter quickly to avoid greater long-term harm.

The speed at which these assessments are conducted cannot be emphasized enough. While business litigation may drag out for years, the first three months are often critical, since parties have to agree on a formal procedure outlining the material to be produced, when it will be available and the form in which counsel will provide it. The ability to collect and assess the data, then develop an early strategy within the first 30 days is a huge advantage. Due to the sheer volume of data and the short timeframes applied, the ability to assess a case within the first 30 days of receiving a legal order is virtually impossible without visual analytics.

Additionally, once beyond the early case assessment, legal teams often still have millions of documents to review prior to trial, and the lawyers are still billing $300 an hour. By making it easier for reviewers to quickly sort through documents, several Fortune 1000 companies using visual analytics have realized a three-fold increase in productivity over non-visual e-discovery review tools. This productivity helps to greatly decrease the cost associated with litigation.

Once a matter actually begins, attorneys are better equipped to create a theory on which to base their case. As more information becomes available, that theory evolves into a strategy, and that strategy into a compelling story. By visualizing data, teams can develop, challenge and review various hypotheses to test persuasive messages.

Risk: Visual Analytics Identifies Patterns in an Unstructured Maze of Data

Visual analytics provides a much more thorough and scalable way to review massive amounts of data than a simple search, which assumes that information exists and is not hidden by a misspelled key word or euphemism. Since lawyers often approach a new assignment without a clear sense as to what is lurking behind the firewalls, visual analytics allows them to see and act on relationships, some of which are not obvious at first inspection. Commonly, a set of emails might not seem relevant until viewed in the context of message connections and frequency between those that sent and received them.

For instance, an email with the subject line of "rescheduled meeting agenda" might contain items and notes that an executive plans to discuss with an outside legal team. A reviewer is able to quickly process the note from they way it is associated with other individuals and items sharing certain terms, immediately develop the insight that it is a privileged attorney-client communication and produce it by marking the item as "privileged" so that it will not be shared with opposing counsel.

Perhaps more importantly, visual analytics can help uncover well-hidden problems. In a recent case, a company suspected two sales employees of fraud totaling $50,000. An investigative team uploaded six months of email into a visual analytics tool and in 20 minutes uncovered millions of dollars in fraudulent activity. A simple keyword search, which many e-discovery tools rely solely on, would have failed to find the additional money.

Corporate management of e-discovery doesn't have to include scaling paper stacks the size of the Sears Tower. By visualizing data and establishing information connections within the matter under review, corporations can greatly speed up the process of review and reduce the risk associated with e-discovery. In total, this will help alleviate one of the most expensive components of e-discovery.


  1. Barry Murphy. "Believe It - eDiscovery Technology Spending To Top $4.8 Billion By 2011."

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access