Top 30 ML in Production Resources (Based On Engagement Data From 17,148 Emails)

ipad-graph

Developing the MLinProduction.com newsletter has been a great way to learn about the topics and content most relevant to the ML engineers and data scientists in my community. To prepare this post, which can also be downloaded as an eBook, I reviewed each issue from 2019 and analyzed the viewership data of 17,148 emails sent to learn which blog posts, journal articles, and case studies readers found most valuable.

These have been organized into 5 categories: data science product management & process, metrics & evaluation, model deployment & operations, ML platforms, and company use-cases.

It’s my hope that the type of content continues to proliferate and enrich the ML community in the future. I also included a highlight of each post as well as links to full interviews I conducted with a number of the resources’ creators.

Data Science Product Management & Process Resources

Managing Machine Learning Projects – An extensive paper describing Amazon’s best practices for managing machine learning projects. It describes how ML projects can fit within an economic framework and offers several techniques like risk scorecards and incremental investment approaches for mitigating the risk associated with machine learning projects. My favorite sections discuss the dichotomy between research and production ML, how to document the data catalogue and pipeline of a project, and assessing the economic value of a project. Highlight: This is a must read for anyone managing data science projects in production.
Building machine learning products: a problem well-defined is a problem half-solved. – How do you develop the requirements for a machine learning project when you’re given a vague problem to solve? This post blends ideas from product development, design, and user experience testing to describe a process for turning a vague idea into a concrete ML product. The author imagines designing a model to organize users’ photos to illustrate his ideas. Highlight: This is an excellent read about how to create machine learning products filled with resources for blending AI and design.

ML Advice From The Creator of This Resource, Jeremy Jordan:

“I cannot understate the value of having a coach to guide you through this process. Even if you begin a project with the best intentions of following such a framework, it’s pretty easy to get sucked into the day to day operations and lose track of the overall project development. This is where it can help to have an external perspective hold you accountable.”

Check out my full interview with the creator of this ML resource here!

Data Science Best Practices – A startup shares its template for running an efficient machine learning infrastructure and team. Their guideline is broken down into 4 goals: all new data scientists will build, train, and deploy production models within their first week, automate the automatable and use humans for the rest, deploy models incrementally and often, end users will never notice a model change unless its an improvement. Highlight: This is a particularly valuable resource because of its section on interviewing new data scientists and deploying models in "dark mode" for testing.
So You’re Going to Manage a Data Science Team – Fantastic piece from a Data Science leader on the people and processes required to build a data team that works. The author drives home the point that managing a team has very little to do with hands on tasks such as feature engineering and model evaluation. Instead, leaders should focus on taking what your team produces and turning it into repeatable, measurable processes. Highlight: Great lessons on why the majority of your time should be spent figuring out how to make your data and models available to the rest of your company.

ML Advice From The Creator of This Resource, Rui Carmo:

“My main advice would be to accurately and succinctly communicate the benefits/results of using your solution along with your understanding of the problems it solves. By all means be enthusiastic about it, but show that you can bring more actual value than a bunch of empty words to the table.”

Check out my full interview with the creator of this ML resource here!

The Product/Data Fit Strategy – Data product strategy is about building products that rely on data and analytics to generate business value. Understanding the value generated by machine learning involves weighing the costs of training accurate models against the value generated from correct predictions. Highlight: This post does a great job of describing how to increase the returns from your investments in machine learning.
The power of the full-stack data science generalist and the perils of division of labor through function – This article argues that "data science roles need to be made more general, with broad responsibilities agnostic to technical function." The author argues that specialization in data science increases coordination costs between teammates, extends wait-times between work, and results in less rewarding work for employees. "By contrast, generalist roles provide all the things that drive job satisfaction: autonomy, mastery, and purpose." Highlight: The author does a great job explaining how teamwork can impact effectiveness in data science.

ML Advice From The Creator of This Resource, Eric Colson:

One mistake I see companies make is to put too much effort into up-front design. There is inherent uncertainty in ML. There are often surprises in the data (in quality, in interpretation, in volume, and even in processing requirements).

Check out my full interview with the creator of this ML resource here!

Data Science Metrics & Evaluation Resources

Product Driven Machine Learning (and Parking Tickets in NYC) – To derive business value from machine learning models, business goals have to be framed in statistical terms. The product manager defines business metrics and business goals. Data scientists then translate these to modeling metrics and modeling goals. Highlight: The post includes a very illustrative use-case using NYC parking tickets.
150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com – This fantastic paper contains a large scale study of the impact of machine learning across many different products at Booking.com and distills the learnings into 6 specific lessons covering all phases of a machine learning project (Inception, Modeling, Deployment, Monitoring and Evaluation). The authors list the challenges faced while building models and describe the techniques they used to address each challenge. Highlight: The authors found that offline evaluation metrics are poorly correlated to business gains and that an ML problem can be formulated in multiple ways, some of which are easier to solve than others.
Predictive Model Performance: Offline and Online Evaluations – Evaluation metrics like AUC are used to estimate a model’s predictive performance in offline settings. But often there are substantial differences between offline metrics and a model’s online performance. This paper discusses the shortcomings of offline evaluation and proposes a method to simulate the online experience using historical data in the context of online advertising. Highlight: The authors discuss a new model evaluation paradigm that simulates the online behavior of predictive models.
Training models with unequal economic error costs using Amazon SageMaker – It’s extremely important to understand the impact of different types of errors when building classification models. In many applications, one kind of error can be much more consequential than another. This excellent blog post demonstrates how to use a custom cost function to incorporate the cost of different types of errors in the model training process. Highlight: The advantage of the approach laid out in the blog post is that it explicitly links a model’s outcomes to the business framework for decision-making.

ML Advice From The Creator of This Resource, Veronika Megler, PhD:

“Think carefully about what you’re doing and why, and how the ML solution fits into the business process and solves the business problem. Then, question your assumptions. It’s easy to get caught up in the technology and in automating it.”

Check out my full interview with the creator of this ML resource here!

Modeling conversion rates and saving millions of dollars using Kaplan-Meier and gamma distributions – Computing conversion rates gets complicated when there’s a substantial delay exists between conversion events. For instance, how do you measure conversion when it takes a new lead weeks or months to make a purchase? This post discusses how to model conversion rates in these situations and introduces an open source library to forecast future conversion rates for new leads. Highlight: Instead of waiting months to see how user acquisition channels are performing, this post helps you calculate the signal earlier and make business decisions faster!

ML Advice From The Creator of This Resource, Erik Bernhardsson:

“Solve for the business need first and make sure you’re building something that is valuable for the business. Then build something super incrementally. Start with a prototype that’s end to end that uses the most simple model you can think of that’s still enough to show some potential.”

Check out my full interview with the creator of this ML resource here!

Driving Business Decisions Using Data Science and Machine Learning – LinkedIn discusses how they connect machine learning to their business challenges by answering two questions: how do you determine the right metric (KPI) for a business goal and how do you test out a new feature on the site to make business decisions? Answering the first question often involves translating fuzzy business questions such as "what is a highly engaged customer" into simple and interpretable metrics that can be analyzed rigorously with data science. The second question involves a highly robust data pipeline solution that enables A/B testing, model monitoring, and the ability to inspect and explain model predictions. Highlight: In one example, LinkedIn demonstrates how data science techniques and methodologies can help companies’ marketing decisions through customer acquisition, customer engagement, and prevention of customer churn.

Data Science Model Deployment & Operations Resources

Continuous Delivery for Machine Learning – An incredible post that walks readers through the technical components of a continuous delivery for machine learning (CD4ML) pipeline. The authors describe CD4ML as "a software engineering approach in which a cross-functional team produces machine learning applications based on code, data, and models in small and safe increments that can be reproduced and reliably released at any time, in short adaptation cycles." Highlight: Using a sample ML application, the authors explain the concepts and demonstrate how different tools can be used together to implement the full end-to-end process including model and dataset versioning, testing, and deployment.
How to Deploy Machine Learning Models: A Guide – This post provides a fantastic summary of the challenges involved in deploying and maintaining machine learning systems. The author dives into what makes machine learning systems hard, different systems architectures, and key principles to keep in mind when designing an ML system. Highlight: The author also provides an overview of useful tooling and hints at useful testing strategies for ML systems.
Machine learning requires a fundamentally different deployment approach – Most of what we know about "production" software comes from web apps running ecommerce and social media apps at scale. But machine learning applications differ from traditional software; ML models are evaluated against metrics rather than strict specifications. Doing this well requires rethinking ideas like version control and testing. To quote the author: "The biggest issue facing machine learning isn’t whether we will discover better algorithms … [it’s] how we’ll put ML systems into production." Highlight: ML Systems require tools that can test and monitor models as well as detect whether models have become stale and need to be retrained.
Models and microservices should be running on the same continuous delivery stack – According to this engineer, the process for deploying models to production looks strikingly similar to the continuous deployment process for microservices. For instance, model deployment involves a build process (model training triggered by new data or a code change), unit testing (evaluating a trained model against holdout data), and a measurement of online production metrics. Since these processes resemble one another, he argues, engineers should seek to leverage existing devops infrastructure for ML problems. Highlight: Though it’s not common practice, says the writer, it makes sense to deploy your model to your staging environment before it goes to production to exercise the model deployment machinery.
On Being Model-driven: Metrics and Monitoring – A trained model’s predictive performance is expected to decline over time after being deployed to production. Monitoring model metrics helps identify this drift – but what metrics should you capture? This post introduces what metrics to monitor and why. P.S. I did some research to find the slide deck referenced in the article. Highlight: Monitoring the predicted output distribution is one way to detect model drift even if you can’t observe the actual targets.
Machine Learning Models Monitoring Architectures – Continuing with the monitoring theme, this post presents 3 system architectures for monitoring models and applying anomaly and drift detection to the streams. The post compares and contrasts the different architectures and includes high-level diagrams. Check out the follow up post for implementation details. Highlight: Model monitoring can be treated as tracking requests/responses to and from models and applying anomaly/concept drift detection to this data.

Resources on Machine Learning Platforms

Meet Michelangelo: Uber’s Machine Learning Platform – If you haven’t yet come across Uber’s ML-as-a-service platform, Michelangelo, you’re overdo. In its introductory blog post on the tool, Uber’s team describes the motivation and architecture of the end-to-end system and how it powers their ML models. A really comprehensive post that’s worth a read. And while you’re at it check out their follow-up piece, Scaling Machine Learning at Uber with Michelangelo as well. Highlight: Michelangelo combines online & offline feature generation and visualization with model deployment to power ML across Uber.
Bighead: Airbnb’s End-to-End ML Platform – Bighead is airbnb’s end-to-end platform for building and deploying machine learning models to production. It’s a collection of tools built around open source technologies for handling all parts of the ML in production stack including model management, feature engineering, online inference, and experimentation. Highlight: The slides do a great job of describing how Airbnb largely relies on Docker for reproducibility and scalability.
Productionizing ML with workflows at Twitter – Cortex, Twitter’s machine learning engineering team, describes how and why they developed ML Workflows to automate, schedule, and share machine learning pipelines. Before ML Workflows, data scientists at Twitter struggled to manage machine learning pipelines for processes like model retraining and hyperparameter optimization. After its adoption, teams have seen the time for retraining and deployment drop down from four weeks to one week. Highlight: Twitter built custom Airflow operators to run training processes, launch prediction services with APIs, and run load tests on these prediction services.
Overton: A Data System for Monitoring and Improving Machine-Learned Products – Overton is an ML platform out of Apple that reimagines the way engineers interact with and build production machine learning systems. Overton automates model construction, deployment, and monitoring by having engineers focus on supervision and data without writing any code in frameworks like Tensorflow or PyTorch. Rather than manipulate code, engineers using Overton manipulate data files that specify tasks. Highlight: This resource provides what feels like a groundbreaking approach to ML platforms and model improvement.
Accelerating the Machine Learning Lifecycle with MLflow – MLflow is an open source ML platform that encourages data scientists to use the latest models, libraries, and frameworks while still promoting reproducibility and deployability. It includes three components that can be used together or separately: Tracking, Projects, and Models. These provide an API for experiment tracking, a format for packaging code into reusable projects, and a generic format for packaging and deploying models. Highlight: The Tracking component records experiment runs, parameters, input data, metrics, and arbitrary output files and can be queried through an API or UI.
Kubeflow – Kubeflow is an ML toolkit dedicated to making the deployment of machine learning on Kubernetes simple, portable, and scalable. The goal of the project is to simplify the machine learning deployment process by leveraging kubernetes to deploy across diverse infrastructure and scale with demand. The KFServing component enables serverless inference on Kubernetes and provides a custom resource definition for serving models for frameworks like Tensorflow, XGBoost, sklearn, and PyTorch. Check out this tutorial on using kubeflow to deploy an XGBoost model and view other examples here. Highlight: Use KFServing to bring features like GPU autoscaling, scale-to-zero, and canary rollouts to your ML deployments.

Resources on ML Company Use-Cases

Maintainable ETLs: Tips for Making Your Pipelines Easier to Support and Extend – While writing ETLs is not the most enjoyable part of a data scientist’s day, this Stitch Fix post explains, a poorly constructed ETL will create obstacles to building new models and deriving insights. The author offers 4 suggestions for writing better ETLs: build a chain of simple tasks, use a workflow management tool, leverage SQL where possible, and implement data quality checks. Highlight: The less you currently care about ETLs and data pipelines, the more I suggest you read this article.
Applying Deep Learning to AirBnB Search – I know what you’re thinking: "A journal paper on deep learning?! I thought we were here to talk about machine learning in production…" I’ve included this paper because it details the lessons Airbnb learned by building deep learning solutions. Rather than focus on architectures, the authors describe how they iterated towards their current solutions, the common deep learning tricks they tried that failed, and the features they engineered to make their models work. Highlight: This is a a wonderful applied deep learning paper.
How Uber Organizes Around Machine Learning – This piece summarizes major applications of machine learning at Uber and describes how Uber leverages ML through technology, process, and organizational structure. Key lessons learned at Uber are to let data scientists use the tools they want, structure your ML teams in a way to best respond to business demands, and think about all areas needed to adopt ML at your company. Highlight: Getting ML organizational structure right has allowed Uber’s ML projects to be owned by teams with multiple ML data scientists but also by teams with little to no technical expertise.
Scaling Machine Learning Productivity at LinkedIn – LinkedIn started the “Productive Machine Learning” (ProML) initiative to increase the effectiveness of their machine learning engineers and democratize their AI tools across the company. They broke their efforts into 6 layers: exploration, training, deployment, running (operational), health assurance, and a feature marketplace. This post provides an interesting discussion of what it means to scale a machine learning practice as the impact of ML increases within a company. Highlight: To build a machine learning organization start with a specialized team of high density of ML knowledge and move towards building services and tools with best practices “baked-in” to empower other engineers.
System Architectures for Personalization and Recommendation – Building machine learning products like Netflix’s recommender system requires a software architecture that handles large volumes of existing data, is responsive to user interactions, and makes it easy to experiment with new recommendation approaches. The Netflix architecture combines online computation that responds quickly to events and uses the most recent data, with offline computation, which allows for more choices in algorithmic approach and has less limitations on the amount of data that can be used. Highlight: Nearline computation, a compromise between online and offline, removes the need to serve results immediately and opens the door to more complex processing per step.
Building Lyft’s Marketing Automation Platform – Lyft describes Symphony, an orchestration system that takes a business objective, predicts future user value, allocates marketing budgets, and publishes that budget to drive new users to Lyft. Since acquiring news users through different channels like search and paid social media involves making thousands of decisions each day, their growth team decided to use machine learning to automate many of these decisions by forecasting customer lifetime value, allocating budget based on customer LTV, and then adjusting bidding strategies accordingly. Highlight: As a machine learning practitioner who works on marketing problems I really enjoyed this piece and hope Lyft decides to publish more on the topic!

Did you enjoy the guide?

Then please do me a favor and share it on LinkedIn (@ML in Production), Twitter (@MLinProduction), or your social media of choice. Your friends will appreciate it too.

Think your resource or a resource you’ve come across should be included in this list? Then send me an email at luigi [at] mlinproduction.com with your thoughts.

Leave a Reply Cancel reply