Video and email have something else in common beside the rate at which they are being created: both are forms of unstructured or semistructured data. As opposed to structured data, which typically resides in a tightly controlled application, unstructured data is masses of (usually) computerized information that either does not have a data structure or has a data structure that is not easily readable by a machine. This latter factor has traditionally made unstructured data highly challenging to deal with in large quantities. Without the ability to automate the indexing, storage and handling of unstructured data, there is simply no effective way for computers to keep track of what is in each piece of unstructured data. For the average consumer, YouTube user or even rank-and-file employee, this does not present much of a problem because they can simply view a video or read an email and know what it is about. But today's businesses are experiencing significant heartache - and expending a great deal of money - in order to address the problem.
At roughly the same time information volumes (especially those of unstructured information) were exploding, two seemingly unrelated trends also gained critical momentum: compliance and e-discovery. Stringent statutory oversight of business was the byproduct of the morally bereft excesses of the Internet boom; names like Enron, WorldCom, HealthSouth, Adelphia and Tyco led to Sarbanes-Oxley, numerous Securities & Exchange Commission rules and many other compliance requirements. Corporations large and small are now required to institute extensive controls to ensure that (among other things) the data within their networks is known, tracked and accounted for at all times. Failure to implement and maintain such controls could lead to increased scrutiny, fines and, probably most harmful, bad publicity. Corporations now have to prove they are playing by the rules - but in order to do so, they need to be able to get a handle on the exploding volumes of data appearing on their networks every day.
Simultaneously, the nascent field of e-discovery is squeezing these same corporations from a different direction. Under the U.S. legal system, any party to litigation is obligated to share any relevant and nonprivileged information under its control with the other party or parties to litigation. When such information lived exclusively in paper format, the process was manageable; attorneys would receive and review all potentially relevant documents, decide what would be given to the other side, make hard copies of the pertinent documentation and proceed from there. This process has been referred to as "discovery" for eons. While the process itself did not change with the digitization of information, the scale of the review process grew exponentially: what was once 10,000 pages of documentation that would take a single review attorney perhaps a week to review became tens of millions of pages of documentation in myriad formats that would take a 20-person team a month to review. To make matters worse, the dollars associated with litigation eclipsed even those of compliance. Litigation alone could cost tens of millions of dollars, and an adverse judgment could exceed this number many times over.
Corporations were thus presented with a dubious choice, one that really wasn't a choice at all: attempt to get the unstructured data genie back in the bottle in favor of the old paper-based world or lean heavily on technological tools to implement an infrastructure better equipped to handle both structured and unstructured data.
Because it lacked a structure of its own and was traditionally not easily readable by a machine, unstructured data presented a particularly tough challenge for corporations trying to get their data under control for compliance and e-discovery purposes. A significant portion of the problem lay with the fact that the tools corporations tried to use to deal with this data were not particularly advanced or helpful. Categorization tools - which would extract key terminology in a piece of information while discerning its overall meaning - were rudimentary at best. And search tools - which would allow users to find specific pieces of information based on the information's content - were not easy to use and, as opposed to their cousins in the Web search world, ill-suited for meeting the scalability, security and relevance needs of enterprises.
Then something interesting happened: the search and categorization industry grew up. After a few false starts and some premature hype, search and categorization tools became easier to use and, more importantly, started delivering better results. Search and categorization tools eventually became the unifying force of information management within many enterprises and professional service firms as they could make sense of huge volumes of data in a relatively effective fashion. Furthermore, search and categorization technology began solving particularly thorny issues such as records management, compliance and e-discovery, which went a long way toward cementing the critical role that search is playing in today's enterprises. The following three brief case studies highlight the increasingly effective roles being played by search and categorization to resolve specific business issues.
Pharmaceuticals
Patents are the lifeblood of the pharmaceutical industry. Every additional month of patent protection - by either filing a patent sooner than a competitor or having patent protection for an additional month - can translate into literally millions of dollars in lost (or additional) revenue. Alternatively, reducing a competitor's patent protection can similarly lower their revenue by a material amount. In an effort to stay on top of global patent activity, one of the largest pharmaceutical companies in the world had a particularly challenging issue: how to keep its researchers abreast of relevant patent applications and issuances in myriad jurisdictions in near real time. Understanding a patent application and its relevance requires highly specialized training, which makes staffing a team of people to manually go through every patent filing prohibitive.










Be the first to comment on this post using the section below.