9 best practices for taking machine learning from theory to production
In spite of the tremendous advances in the artificial intelligence field in recent years, many organizations still struggle to see the return on investments they made to integrate AI and machine learning into their business strategy. One reason is that data scientists don’t always measure their models’ performance in terms of the value it brings to users.
Theoretical AI is hard to translate into functional AI products because it can be difficult to obtain data and manage the model once it’s in production — a process often referred to as Machine Learning Lifecycle Management or Machine Learning DevOps.
Machine LearningDevOps is a crucial yet error-prone process. Here are a few pointers for avoiding the mistakes inherent in that management process and more closely aligning ML model outputs with business metrics.
When it is time to deploy a system to production, it is only natural to start with a set of assumptions regarding the DevOps process. However, once you convert the models into real-life data products, it becomes apparent that things are never as simple as those assumptions would lead you to believe. For example, in the context of eCommerce, it is important to adapt the frequency with which a model is re-trained to the market.
Failure to remain flexible and to embrace the dynamism of the machine learning solutions they create is one of the top reasons why data scientists don’t see their models live up to their own expectations once in production.
Continuously Monitor Inputs
A model’s accuracy as computed during testing can indicate how the model will eventually perform in reality, but that measure remains mostly theoretical. Furthermore, the model’s accuracy typically degrades over time due to data drift, and there is no good way to measure how quickly the accuracy will drop and when the time is right to re-train the model.
People wrongly assume that when a model’s performance starts dropping, the model is to blame. But DevOps for machine learning systems is challenging because ML products combine not only the model and the system on which the model is deployed, but also the data it uses for inference. When a feature powered by machine learning suddenly starts going rogue, it is reasonable to question either the system or the data rather than the model itself.
To avoid unfortunate conclusions, it is always good practice to monitor the input data fed into the models. More often than not, keeping track of the statistical signature of those inputs over time to identify data drift is sufficient to catch problems early and predict when a model might start failing. Organizations should mix human intelligence with machine learning to continuously monitor model deployments by periodically asking a network of contributors to evaluate samples of data.
Optimize Data Usage
At a previous job, I found that our team was using four times more data than they could afford to. Companies with limited compute power or those using cloud compute services are especially impacted by this trend: Those with limited resources don’t necessarily use too much data more often, but they are more adversely affected when they use more data than required.
Because data scientists are taught to optimize for model accuracy, they tend to use as much data as possible when training their models. Yet for most models, accuracy isn’t proportional to the amount of data used, and usually asymptotically tends toward a given value. To address this, data scientists can build a learning curve in order to identify the optimal amount of data required to train their model without breaking the bank or hoarding their company’s servers for more time than is necessary.
Don’t Automate Too Early
Data scientists aren’t usually very excited about the topic of Machine Learning Lifecycle Management because they’d much rather work on new exciting problems than on improving old models. As a consequence, there is a temptation for many to automate their ML product from the get-go.
We now know it is often dangerous to push a model to production with the assumption that it will perform as expected. Automating a model that isn’t well understood and tested in real-life conditions is a common mistake that is easily avoided.
Keep It Simple
Most readers will know Occam’s razor: the simplest answer is usually correct. Yet, the data science community today tends to enjoy experimenting with the newest methods, sacrificing simplicity for the thrill of trying something new and unusual. Starting with a simplistic model — a minimum viable product — before gradually making a solution more complex is the way to go.
Use Problems as Opportunities to Learn
If a model is underperforming, it is often tempting to go back to the drawing board and start over. The problem with this approach is that it throws the baby out with the bathwater.
Failing models are a mine of information regarding corner cases, the way people use the data product, and technical limitations that must be addressed during the next iteration. Whenever we decide to create something new, we lose an opportunity to understand both the business problem at hand and the system in depth, and we are likely to repeat some of the same mistakes over time.
Test Models Thoroughly Before Shipping
When training a speech recognition model, for example, we might only have access to a sample of audio files of English speakers. In this case, the model would generalize poorly to people with foreign accents, creating a disappointing experience for many real-life users.
Data scientists must ensure they account for corner cases and exceptions as much as possible given the quality of the data that is available for the task at hand. As a general rule, thorough testing should happen sooner rather than later, and definitely should happen prior to the engineering QA process.
Build Things That Are Explainable
Debugging issues in production is always painful, no matter the system. But if we consider the number of moving parts involved in the creation of a model, ranging from the many different features that can come into play as inputs to a model, finding and fixing problems might turn to be almost impossible to do in a reasonable amount of time if the model isn’t explainable.
Explainability isn’t only a way to provide some transparency into the decisions that an algorithm makes, it is also the best way to ensure that the human who built the model, as well as others, can trace problems back to their origin. Building models that have as many interpretable elements as possible makes a huge difference when it comes to maintaining them on the longer term.
Use Human Input
Full automation remains a myth for now. However, the Pareto principle suggests that automating 80 percent of a system is a fairly easy thing to do, while reaching a completely autonomous system would take a tremendous amount of effort.
It is wise to take an approach similar to building an auto-pilot for an aircraft. Automate the most straightforward parts of the system while leaving a human in control whenever a difficult or sensitive task needs to be completed. Human-assisted automation has proven over and over again to be better than full automation.
Design Carefully to Reap ML-Enabled Benefits
Hopefully, these tips and best practices related to Machine Learning Lifecycle Management will help you push your models in production while keeping in check most of the risks associated to the deployment of ML solutions. No matter how you proceed, make sure you have a well-defined long-term strategy for the maintenance of your model, and try to leave as little as possible to chance.