ML in Production

Deploying Models on AWS SageMaker – Part 1 Architecture

A few weeks ago I had the chance to speak to the AI and Data Science Fellows at the Insight Data Science program here in New York City. The fellows in the audience were finishing up their machine learning final… Read More

Kubernetes Services for Machine Learning

In my previous Kubernetes for Machine Learning post, we used a Kubernetes Deployment to build a REST API to serve a trained machine learning model. In that setup, issuing requests to generate predictions was only possible from within our Kubernetes… Read More

The Ultimate Guide to Model Retraining

Machine learning models are trained by learning a mapping between a set of input features and an output target. Typically, this mapping is learned by optimizing some cost function to minimize prediction error. Once the optimal model is found, it’s… Read More

Kubernetes Deployments for Machine Learning

Suppose your data science team has deployed a couple of batch machine learning processes on Kubernetes. You’ve successfully used Kubernetes Jobs to deploy model training and you’ve scheduled daily batch inference tasks using CronJobs. But now you’re tasked with serving… Read More

Kubernetes CronJobs for Machine Learning

In my previous post we discussed how to leverage Kubernetes Jobs to perform common production machine learning tasks such as model training and batch inference. Jobs allow us to reliably run batch processes in a fault tolerant way. Even if… Read More

Kubernetes Jobs for Machine Learning

In my previous post I introduced Kubernetes Pods, the basic building block of the Kubernetes ecosystem. In that post I discussed what a Pod is, how it fits into the Kubernetes system, and how to create, view, and delete a… Read More

How to Use Kubernetes Pods for Machine Learning

One of the best parts of being a Data Scientist is the dynamic nature of the job. You’ll likely spend a majority of your time feature engineering, building models, or running experiments. But depending on your role, you may also… Read More

An Introduction to Kubernetes for Data Scientists

In the final piece of my Docker for Machine Learning blog series, I implemented a system for exposing machine learning models as a REST API running in a Docker container. With the container running, I was able to generate predictions… Read More

Tracking Machine Learning Metadata with Sacred Library

A few weeks ago I wrote about why storing metadata is critical for the machine learning process. Since building machine learning models is an iterative process, often involving multiple people and a diverse set of tools, we need the ability… Read More

How Data Leakage Impacts Machine Learning Models

The silver bullet. A feature that led to AUC increasing from .6 to .8. After working on feature engineering for several months, I thought I had finally cracked the code and created a feature that pushed my machine learning model… Read More