Legal and compliance teams critical to machine learning success
This Q&A with Jake Frazier is based on the first of a series of interviews I’m conducting with thought leaders who take a unified governance approach to increasing the value of information to their businesses while driving down costs.
Along with his role as senior managing director at FTI Consulting, Jake is a faculty member of CGOC, a founding member of the Electronic Discovery Reference Model (EDRM), a member of the Sedona Conference and an Advisory Cabinet Member of the Masters Conference. He has authored many articles and white papers on information governance issues and regularly addresses industry groups on the topic.
For this article, I asked Jake about the new and complex challenges around the adoption of machine learning (ML) technologies in enterprises. ML offers business users an unprecedented opportunity to take advantage of the massive amount of data they are collecting. However, ML is also increasingly important to legal and compliance teams.
First, these teams must ensure that all ML projects throughout the organization comply with evolving privacy and security regulations, while enabling proper preservation in the event of litigation. Second, ML technology can enable security and compliance teams to improve their own processes.
Information Management: In general, what are companies doing right when it comes to machine learning? What are they doing wrong?
Jake Frazier: Companies that have been successful with big data and machine learning initiatives typically start with a very narrowly defined use case that can be connected to a tangible business value or metric.
For example, a service provider wanted to leverage ML to improve customer support and reduce churn. It started by having staff review the recordings or transcripts of calls and tag phrases used by customers who then decided to switch. Next, the machine learning application was trained to look for similar indicators, so when it listened to calls in real-time, it could identify an at-risk customer and immediately escalate the call to a supervisor.
The success of this type of use case was easy to measure. The percentage of customers switching after using the indicators dropped significantly thanks to automatic identification and intervention.
Companies run into problems with ML in a couple of ways. First, and most dangerous, is the failure to involve legal and compliance teams in the formulation of ML projects. With the rapid evolution of privacy regulations, it’s essential for enterprises to ensure they remain compliant.
Another common issue is when companies focus on the technology first. Companies often invest millions of dollars and perhaps years developing a machine learning platform, convinced the organization will derive numerous benefits from different departments flocking to take advantage of it. Unsurprisingly, they don’t get the adoption they expect because they didn’t present a successful use case to their internal customers.
A third critical mistake organizations make is not understanding the human part of the equation, that is, failing to adequately train the machine learning engine. It’s essential to use an iterative approach to ensure the ML engine is accurate in its analysis or identification. Failure to do this will undoubtedly lead to a high error rate.
When it comes to implementing ML technology for lines of business, IT will likely have to conduct education around the need for expert human input from the very start of the project.
IM: How are enterprises using machine learning for legal and compliance today?
Frazier: Technology Assisted Review (TAR) is the most common use case. Until recently, a government or regulatory investigation, a discovery request for civil litigation, or even a large internal investigation would require the review of hundreds of thousands or even millions of documents – each looked at by a human being – to determine if the document was relevant and should be produced. The process was long, complex and very expensive.
Today, with TAR, a machine learning-powered database sits under the review platform and is trained to do the review. When a team begins reviewing and tagging documents, the ML engine learns from each code entered. After perhaps 10 percent of the documents have been reviewed, it’s time to test what the tool has learned from the human tagging using another batch of documents.
Accuracy is measured using “Recall,” that is, the percent of relevant documents the machine actually tagged as relevant, and the “Precision,” the percent of documents tagged as relevant that actually aren’t. As necessary, more human tagging is used to increase the machine’s accuracy. Once the machine reaches an acceptable level of accuracy, for example, a Recall percent of 90, the machine can do the rest of the review, no matter how large the document pool.
Technology Assisted Review is saving organizations millions of dollars and thousands of hours. When properly trained, it is actually faster and more accurate than human review because it works 24/7 and eliminates inconsistency from one reviewer to the next.
In fact, TAR may be the perfect initial “narrowly defined use case” enterprises can use to prove the value of ML, and we have seen several instances where information governance teams have built on Legal’s success and are using ML to classify whether documents are sensitive and need to be protected, fall under legal or regulatory retention requirements, or can be disposed of.
IM. What are the legal and compliance concerns of ML projects?
Frazier: The biggest problem with machine learning projects is really attitude, not the technology. For example, legal departments, in just about any industry, tend to think in absolutes when implementing TAR for e-discovery.
When we ask what the standard should be for accuracy – the Recall percent I previously discussed – we often hear, “100 percent.” Why not? It’s technology and it’s automated, so shouldn’t we expect 100 percent? But we should remember that machine learning is analyzing the documents based on what it has learned from human training, and that will never be perfect. So I find myself frequently repeating, “Don’t let perfection be the enemy of the good.”
What I mean is that many organizations have been using a “save everything” strategy as the “safe” approach for years. This has put them in a position where they can no longer find what they need, so they end up not producing all relevant documents anyway. A solution that gets them to 90 percent Recall may actually make them more compliant with legal and compliance requests, while producing much faster and at a lower cost.
Of course, there are other ML use cases beyond e-discovery, such as patient diagnosis, where 90 percent Recall would be insufficient, and therefore ML engine training would be ongoing. The key is being thoughtful about the use case so that the human training and Precision/Recall rates are appropriate.