An Introduction to Kubernetes for Data Scientists

Kubernetes Logo

In the final piece of my Docker for Machine Learning blog series, I implemented a system for exposing machine learning models as a REST API running in a Docker container. With the container running, I was able to generate predictions with the model by making web requests to the container. However, this model was only available locally on my machine. What would we have to do to make this model available to other users over the internet? And how could we set up this infrastructure to scale with more and more users?

In this post I’d like to introduce Kubernetes, an open source container orchestration tool designed to automate deployment, scaling, and management of containerized applications. We’ll describe how data scientists can use Kubernetes to scale their machine learning workloads to new heights.

What is Kubernetes?

To understand how Kubernetes can help scale machine learning, let’s imagine a data science team that decides to adopt containers for running its machine learning workloads. What kinds of issues will the data scientists face as they begin to deploy more and more models into production?

At first the data science team has a single model in production. The model is retrained every few months when new training data is available and batch inference is performed weekly. This can be handled very easily on a single server with Docker installed. The data scientists are excited to deploy more models and decide to create a recommendation engine to serve users recommendations in real time. To do this they expose their model with a REST API. This deployment is much more complicated than the first, since the data scientists have to muck around with deploying a web application, figuring out what a DNS is, and a host of other web related stuff. But they hack some code together from a bunch of tutorials they found online. Foolproof.

Surprisingly, problems begin to emerge. The marketing team decides they need the batch training job to run more frequently and with several more MBs of training data. This causes resource contention issues when the training and inference jobs run at the same time. The server exposing the recommendation model is seeing an increase in traffic each day which leads to slowing response times. On top of this some smart data scientists have trained a new recommendation model and wish to A-B test it against the currently deployed model. No one has any idea how to do this.

Unsurprisingly, the data scientists aren’t able to find tutorials demonstrating how to overcome these hurdles. They piece together some shell scripts full of docker commands, but they find their production containers failing frequently. In order to scale their compute resources, they have to manually create new AWS instances and redeploy their applications leading to model downtime. One data scientist thinks to himself: “Why am I spending all of my time in the AWS console? When can I train models again?”

Enter Kubernetes

To understand why Kubernetes is a useful tool for data scientists aiming to scale their machine learning efforts, let’s describe how we can use Kubernetes to solve the issues from our imagined scenario. Along the way, I’ll introduce several Kubernetes concepts. But before we dive in, it’s worth describing the Pod, the basic bulding block of Kubernetes.

Containers and Pods

A container is the atomic unit of the Docker ecosystem. It can be thought of as a single instance of a running application. In Kubernetes, the Pod is the basic building block and represents the smallest deployable object. Pods are composed of one or more containers that are tightly coupled together. A typical use case is a Pod that runs a single container. But Pods may also run multiple containers that work together to provide a single service. Either way, a Pod is meant to run a single instance of an application. For example, we could create a Pod that performs models training. This Pod may be a wrapper around a single container that runs the training code.

The Batch Case

Back to our hypothetical data science team.

The team’s first model was deployed on a single server with enough resources (CPU, memory, etc.) to support model training. To support a growing training set size, the data scientists need to port the training jobs to machines with more resources. Since the jobs are already running in Docker containers, it’s easy to run the jobs on new machines that have Docker installed. But setting up that new infrastructure, including provisioning the larger machines and installing necessary dependencies, is far from trivial.

Kubernetes allows you to declare minimum and maximum compute resources for your containers so that the containers have the appropriate amount of resources. If the computation need more resources, just update a configuration file, apply those changes using the kubectl command-line client, and Kubernetes does the work of scheduling the computation on the underlying provisioned nodes. No need to manually create new instances.

What about scheduling and running the batch training and inference jobs? Without Kubernetes, you might resort to using a time-based job scheduler like cron. This might work at first but won’t scale. Your code will fail at some point, and without logic that does things like restart failed jobs and alert developers about issues, you’ll just wind up with a fragile ecosystem of glued together logic.

Kubernetes provides a set of abstractions that deploy and maintain applications. If you need to run a single training or inference job, you can create a Job object that is responsible for running a specific number of Pods until completion. You can configure a Job to restart in case it fails, and even to fail completely after some number of retries. If you’d like to set up your Jobs to run on a time-based schedule, just use a CronJob.

Deployments and Services for Online Inference

We mentioned that deploying models for online inference is drastically harder than the batch case. In our example, the data scientists had to spend time learning about load balancers and other web technologies. When their model began to see an increase in traffic, they didn’t know how to scale their application to support the increased load. And they certainly did not have the expertise to set up A-B testing for different models.

Kubernetes drastically simplifies the process of exposing your models for others to use. First, create a Deployment that specifies which containers to run and how many replicas of the application you wish to create. Then expose this Deployment using a Service that defines rules for exposing pods to each other and to the internet. Kubernetes will handle load-balancing traffic among the replicas and can even be configured to handle autoscaling resources to meet increased demands. By using Deployments, data scientists can horizontally scale their applications (i.e. add more copies of the model) to serve more users. Kubernetes then handles the work of balancing the traffic among these replicas.

Conclusion

We’ve given a high-level overview of how data scientists can use Kubernetes to scale their machine learning workloads. While Kubernetes is a new technology that many data scientists aren’t too familiar with, it’s seeing widespread adoption across tech. If your company uses Kubernetes to deploy its applications, and your data science team is having trouble scaling, seriously consider adopting for your machine learning workflows.

In subsequent posts, I’ll dive deeper into how to use Kubernetes to accomplish the feats I’ve described above. We’ll examine how to use Kubernetes to train models, perform batch inference, and expose your models through REST APIs!

If you’re interested in receiving these future posts, sign up to receive them directly in your inbox below.

2 thoughts on “An Introduction to Kubernetes for Data Scientists”

Leave a Reply

Your email address will not be published. Required fields are marked *