"All models are wrong, but some are useful." – George Box
This quote by statistician George Box is generally used to illustrate the point that models are simplified representations of reality. Some of these representations very accurately describe the way the world works. If you drop a ball from some height, for instance, kinematics predicts where that ball will be T seconds later. But the "laws" of kinematics break down at the atomic level. The model ceases to be useful.
I propose a similar pithy statement for machine learning models:
"No machine learning model is valuable, unless it’s deployed to production." – Luigi Patruno
Machine learning models can only generate value for organizations when the insights from those models are delivered to end users. Who the end user is can vary: recommender systems in e-commerce suggest products to shoppers while advertisement click predictions feed software systems that serve ads. But either way ML projects can only be successful after a model has been deployed and its predictions are being served.
Surprisingly machine learning deployment is rarely discussed online. Bootcamps and grad programs don’t teach students how to deploy models. If you do a google search, you’ll find a lot of blog posts about standing up Flask APIs on your local machine, but none of these posts go into much detail beyond writing a simple endpoint.
So I decided to write a comprehensive blog series on how to deploy ML models to production. Many of these blog posts include tutorial-style code. But my goal isn’t to code up a complete system. My goal is to educate data scientists, ML engineers, and ML product managers about the pitfalls of model deployment and describe my own model for how you can deploy your machine learning models. My model, as George Box described in so few words, is probably wrong. But I hope that it’s still useful to you.
- What Does it Mean to Deploy a Machine Learning Model? – What does it even mean to "deploy a model?" How does deployment fit into the machine learning process? What factors should you take into consideration when deciding how to deploy?
- Software Interfaces for Machine Learning Deployment – Deployment is considerably easier when you’re working with the right interfaces. Doubly important when you’re using models across different frameworks and languages. So what’s the right interface to make deployment easier?
- Batch Inference for Machine Learning Deployment – If you can precompute and cache predictions in batch, DO IT! It’s much easier than deploying and maintaining APIs and other near real time infrastructure. Here’s how to do batch inference.
- The Challenges of Online Inference – But when you need predictions in real time, you need online inference. There are many gotchas in online inference: you need to query data from multiple sources in real time, you’ll need A/B testing, you need rollout strategies…
- Online Inference for ML Deployment – If after learning about those challenges you decide you still need online inference, bless your heart. There are a lot of posts on Flask APIs, but that’s the easiest part. You need versioning, autoscaling, and the ability to A/B test models.
- Model Registries for ML Deployment – Where do you store all these trained models? Where do you track metadata and lineage? How do you retrieve models at inference time? That’s where you’ll need a model registry.
- Test-Driven Machine Learning Development – It’s not enough to use aggregate metrics to understand model performance. You need to know how the model does on sub-slices of data. You need machine learning unit tests.
- A/B Testing Machine Learning Models – Just because a model passes its unit tests, doesn’t mean it will move the product metrics. The only way to establish causality is through online validation. Like any other feature, models need to be A/B tested.
More to come
The series isn’t done yet! I expect to write the following posts soon. If there’s something you think I’m missing, leave a comment below.
- Roll-Out Strategies for ML Deployments – How do you deploy your models in shadow mode? How do you decide between canary and blue/green deployments?
- CI/CD for Machine Learning – How do we tie together lessons from every post in the series to enable continuous integration and continuous development for ML?
If you want to be notified when these posts are published, subscribe below. You’ll also receive my weekly newsletter on all aspects of production ML systems.
Hi-
There is a decision very early that is not covered here: what is the strategy for failures?
A company I know shipped a dashcam for commercial vehicle fleets: cabs, delivery, etc. The dashcam pointed a cam at the driver for various reasons. One feature was to ID the driver. Of course, the facial recognizer had a failure rate. The system was deterministic, so every shift the same guy sat down and the dashcam reported ‘I don’t know this guy’.
This facial recognition feature was shipped with no strategy for failures.
Hi Jack. Good point about the strategy for failures. The reason I didn’t cover it in this series is because that strategy should be designed at the product planning stage – not during the deployment. It’s important to understand that models WILL get things wrong. Teams should plan for this BEFORE any implementation.
Thanks Luigi, I have been reading your articles and want to thank you to start the Production in ML series. I can’t emphasize the importance of this aspect of the ML pipeline which most of the industry tends to not focus on.
Building a ML model is just a small part of the ecosystem, and there are many other things including, continuous deployment of models, model and data versioning, model and data lineage and many more which I think should be an integral part of building any ML pipeline. Needless to say if the business value, optimization criteria and metrics are not defined prior, the project may very well never take off.
I am looking forward to the last 2 sections of this series on the Roll out Strategies and CI/CD for ML pipelines. Thanks for sharing this information with all the ML enthusiasts around the globe.
Thanks Abhik!
I really appreciate this series Luigi! Almost finished the articles and looking forward the last two
This series is super helpful! And it’s very friendly to us data scientist who knows very little about engineering.