Kubernetes Deployments for Machine Learning

Suppose your data science team has deployed a couple of batch machine learning processes on Kubernetes. You’ve successfully used Kubernetes Jobs to deploy model training and you’ve scheduled daily batch inference tasks using CronJobs. But now you’re tasked with serving predictions to users in realtime. You also know that your job isn’t done once you deploy the first version of the model. In the future you’ll need to update your models and A/B test different versions with little to no application downtime. How do you even begin to do this?

Enter the Kubernetes Deployment.

In this post we’ll discuss Deployments and how they are useful for machine learning applications. We’ll create a Deployment for exposing a trained machine learning model via a REST API and show how to access the API from within your cluster.

Note: This article assumes you have access to a Kubernetes cluster and have the kubectl command line client installed.

What is a Deployment

A Deployment represents a set of multiple, identical Pods meant to manage stateless services running on a Kubernetes cluster. At a high-level, Deployments allow developers to manage and upgrade a set of identical Pods in a controlled way. For example, Deployments ensure that any instances of a deployed application that fail or become unresponsive are automatically replaced without manual intervention. Developers can also update Deployments by making changes to the Deployment’s configuration. The update will trigger a gradual rollout of the changes where current Pods are terminated and new Pods are created so as to gracefully transition the application without downtime. Deployments also allow developers to rollback to previous versions of the application at any time. So if a bug is accidentally deployed, rolling back to previous versions of an app is seamless. Deployments can also be scaled to meet increased application load.

How are Deployments Useful for Machine Learning

Deployments are built to manage stateless application which makes them perfect for exposing machine learning services. Suppose you need to expose a machine learning model to external users. You’ve trained a model and built a simple REST API that accepts requests for predictions. You can then create a Deployment for managing Pods that accept requests and serve predictions in realtime. The Deployment can be easily scaled up to meet increased demand, eliminating the need to manually provision and manage additional copies of the app. If a new version of a model becomes available, you can update the Deployment and Kubernetes will manage rolling over to the new version without any downtime. And if you find that your new model isn’t performing as well as it should, you can easily rollback.

It’s worth mentioning that you can also use Deployments for canary and Blue/Green deployments and for live A/B testing of models. But these will be topics for future posts.

Interacting with Deployments

We will create a Deployment that exposes a trained machine learning model for online inference. The trained model will be accessible via a REST API implemented using the Flask-RESTful Python package. In order to focus on the Kubernetes Deployment, I will be reusing the Docker and Python code from an earlier blog post Using Docker to Generate Machine Learning Predictions in Real Time. If you haven’t already, I recommend reading that post and then working through this post.

In order to make the Docker image from that previous post available to my local kubernetes cluster, I created a public repository on Docker Hub called lpatruno/k8-model-api. I then built the image with docker build:

$ docker build -t k8-model-api -f Dockerfile .
Sending build context to Docker daemon  10.75kB
Step 1/10 : FROM jupyter/scipy-notebook
 ---> 2fb85d5904cc
Step 2/10 : COPY requirements.txt ./requirements.txt
 ---> Using cache
 ---> f7d3df033bb4
Step 3/10 : RUN pip install -r requirements.txt
 ---> Using cache
 ---> 53c7c84910fd
Step 4/10 : RUN mkdir model
 ---> Using cache
 ---> c6f10f206379
Step 5/10 : ENV MODEL_DIR=/home/jovyan/model
 ---> Using cache
 ---> b42d1d794ca7
Step 6/10 : ENV MODEL_FILE=clf.joblib
 ---> Using cache
 ---> b53a23eddebf
Step 7/10 : ENV METADATA_FILE=metadata.json
 ---> Using cache
 ---> 63de64423761
Step 8/10 : COPY train.py ./train.py
 ---> Using cache
 ---> a1c900a04e0e
Step 9/10 : COPY api.py ./api.py
 ---> Using cache
 ---> d7f6dc4426cb
Step 10/10 : RUN python3 train.py
 ---> Using cache
 ---> 91ce4cac6dd3
Successfully built 91ce4cac6dd3
Successfully tagged k8-model-api:latest

Next I tagged the image with docker tag and pushed the image with docker push:

$ docker tag k8-model-api:latest lpatruno/k8-model-api:latest
$ docker push lpatruno/k8-model-api:latest
The push refers to repository [docker.io/lpatruno/k8-model-api]
e886fcf38a54: Layer already exists 
cf23b32969dc: Layer already exists 
96e23aa142d2: Layer already exists 
823012148cfe: Layer already exists 
aa3beaf2eef2: Layer already exists 
e99a83b88f15: Layer already exists 
03de148dfb0a: Layer already exists 
b0f3e4f91d7b: Layer already exists 
d678676e139c: Layer already exists 
f1c34378f44b: Layer already exists 
3e989afdb948: Layer already exists 
5d8e59e8fa3d: Layer already exists 
d0fac854ebed: Layer already exists 
4e4c852921cc: Layer already exists 
6db4e45cf563: Layer already exists 
b9c6b5375a6e: Layer already exists 
ec7a5c783ba6: Layer already exists 
305d55183e3e: Layer already exists 
e4da5278aad5: Layer already exists 
88fb11447873: Layer already exists 
c3c9a296a12d: Layer already exists 
69ff1caa4c1a: Layer already exists 
e9804e687894: Layer already exists 
e8482936e318: Layer already exists 
059ad60bcacf: Layer already exists 
8db5f072feec: Layer already exists 
67885e448177: Layer already exists 
ec75999a0cb1: Layer already exists 
65bdd50ee76a: Layer already exists 
latest: digest: sha256:c5bbcdd779b39d18865be0df2c5a3bbd27bab15e6f75fc363e740008a23a87cc size: 6384

The image is now available for anyone to use by referencing lpatruno/k8-model-api:latest.

Creating a Deployment

Let’s create a Deployment called k8-model-api that exposes our machine learning model as a REST API. We first create a YAML file called deployment.yaml containing the configuration of our Deployment.

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: k8-model-api
spec:
  replicas: 2
  selector:
    matchLabels:
      app: model-api
  template:
    metadata:
      labels:
        app: model-api
    spec:
      containers:
      - name: model-api
        imagePullPolicy: Always
        image: lpatruno/k8-model-api:latest
        command: ["python3",  "api.py"]
        ports:
        - containerPort: 5000

This yaml file contains four top-level keys. The apiVersion specifies which version of the Kubernetes API to use. The kind field specifies which type of Kubernetes resource we wish to create. In this case, we are creating a Deployment object. The metadata field lists a set of labels, arbitrary key-value pairs developers can attach to Kubernetes objects. The docs contain a recommended set of labels, but I would recommend appending your own machine learning specific metadata as well. The spec field specifies the characteristics you want the resource to have. Every Kubernetes resource must contain a spec field, but the format of the object spec is different for different resources (see the Kubernetes API Reference).

In this example a Deployment named k8-model-api is created, indicated by the metadata.name field. This Deployment creates 2 replicated Pods, indicated by the .spec.replicas field. The .spec.selector field defines how the Deployment finds which Pods to manage. In this case we specify a label (app: model-api) that’s listed in the Pod template.

The .spec.template field lists the configuration for the Pods we wish to create. The Pods are labeled app: model-api using the labels field. The Pod template’s specification, or .template.spec field, specifies that the Pods run a single container named model-api with image lpatruno/k8-model-api:latest. The command to be executed in that container is python3 api.py and port 5000 should be exposed on the container to accept and send traffic. This is the default port used by the Flask-RESTful library.

We can create the deployment with the kubectl create command:

$ kubectl create -f deployment.yaml
deployment.apps/k8-model-api created

Viewing Deployment

Let’s take a look at the Deployment we created.

$ kubectl get deployments
NAME           READY   UP-TO-DATE   AVAILABLE   AGE
k8-model-api   2/2     2            2           111s

NAME lists the names of the Deployments in the cluster. READY displays how many replicas are currently running. UP-TO-DATE displays the number of replicas that have been updated to achieve the desired state. AVAILABLE displays how many replicas of the application are available to your users. AGE displays the amount of time that the application has been running.

It’s worth mentioning that the Deployment resource doesn’t manage Pods directly. Instead, the Deployment created a ReplicaSet whose purpose is to maintain a stable set of replica Pods at any given time. The ReplicaSet is used to guarantee the availability of a specific number of identical Pods.

Let’s take a look at the ReplicaSet our Deployment created.

$ kubectl get rs
NAME                      DESIRED   CURRENT   READY   AGE
k8-model-api-5f9cd49555   2         2         2       6m20s

Notice that the name of the ReplicaSet is formatted as [Deployment Name] - [Random String].

Let’s also examine the Pods managed by the k8-model-api-5f9cd49555 ReplicaSet.

$ kubectl get pods
NAME                            READY   STATUS    RESTARTS   AGE
k8-model-api-5f9cd49555-c899v   1/1     Running   0          17m
k8-model-api-5f9cd49555-hqnqz   1/1     Running   0          17m

Notice that the name of the Pods is formatted as [ReplicaSet Name] - [Random String].

Running Online Inference

By creating our Deployment we have created 2 Pods that expose a trained machine learning model via a REST API. Let’s send a request to this API in order to generate predictions. At this point we should mention that our REST API is not yet exposed to users over the internet. In order to expose the model to external users, we’ll need to create a Service which is the topic of a future blog post ; )

However we can query our APIs from within the Kubernetes cluster. To do this, we can create a new Pod, ssh into that pod, and then issue a cURL command to query the running k8-model-api Pods. First, let’s find the internal IP addresses of one of the running Pods by using the kubectl describe command.

$ kubectl describe pod k8-model-api-5f9cd49555-hqnqz
Name:               k8-model-api-5f9cd49555-hqnqz
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               docker-desktop/192.168.65.3
Start Time:         Sat, 01 Jun 2019 12:25:42 -0400
Labels:             app=model-api
                    pod-template-hash=5f9cd49555
Annotations:        <none>
Status:             Running
IP:                 10.1.0.94
Controlled By:      ReplicaSet/k8-model-api-5f9cd49555
Containers:
  model-api:
    Container ID:  docker://ba1ea1b4c875557db2ed3bc486c7cfc7f42c8db8e17a9281a5c04f799de604a6
    Image:         lpatruno/k8-model-api:latest
    Image ID:      docker-pullable://lpatruno/k8-model-api@sha256:c5bbcdd779b39d18865be0df2c5a3bbd27bab15e6f75fc363e740008a23a87cc
    Port:          5000/TCP
    Host Port:     0/TCP
    Command:
      python3
      api.py
    State:          Running
      Started:      Sat, 01 Jun 2019 12:25:45 -0400
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-96lvl (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  default-token-96lvl:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-96lvl
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                From                     Message
  ----     ------            ----               ----                     -------
  Warning  FailedScheduling  27m (x3 over 27m)  default-scheduler        0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
  Normal   Scheduled         27m                default-scheduler        Successfully assigned default/k8-model-api-5f9cd49555-hqnqz to docker-desktop
  Normal   Pulling           27m                kubelet, docker-desktop  Pulling image "lpatruno/k8-model-api:latest"
  Normal   Pulled            27m                kubelet, docker-desktop  Successfully pulled image "lpatruno/k8-model-api:latest"
  Normal   Created           27m                kubelet, docker-desktop  Created container model-api
  Normal   Started           27m                kubelet, docker-desktop  Started container model-api

Here we see that the IP field is 10.1.0.94.

Next let’s create and ssh into a new Pod in the cluster. We can perform both of these steps with the following command (Technically, this actually creates a new Deployment, rather than just a Pod):

$ kubectl run python3 -ti --image=python:3.6 --command=true bash
kubectl run --generator=deployment/apps.v1beta1 is DEPRECATED and will be removed in a future version. Use kubectl create instead.
If you don't see a command prompt, try pressing enter.
root@python3-5bf5ddf449-2pb7p:/

Now that we’re in a Pod in the cluster. let’s query our REST API using the cURL command:

root@python3-5bf5ddf449-2pb7p:/ curl -i -H "Content-Type: application/json" -X POST -d '{"CRIM": 15.02, "ZN": 0.0, "INDUS": 18.1, "CHAS": 0.0, "NOX": 0.614, "RM": 5.3, "AGE": 97.3, "DIS": 2.1, "RAD": 24.0, "TAX": 666.0, "PTRATIO": 20.2, "B": 349.48, "LSTAT": 24.9}' 10.1.0.94:5000/predict
HTTP/1.0 200 OK
Content-Type: application/json
Content-Length: 41
Server: Werkzeug/0.15.0 Python/3.6.8
Date: Sat, 01 Jun 2019 16:56:24 GMT

{
    "prediction": 12.273424794987879
}

Success! We submitted a single request to one of the Pods and the Pod returned a prediction. To query the other Pod, use the kubectl describe command to retrieve the IP address for that Pod and then substitute the IP address in the above cURL command. I leave this as an exercise for the reader.

Deleting a Deployment

To delete the Deployments we created, simply run the kubectl delete command followed by the resource type and names.

$ kubectl delete deployment k8-model-api python3
deployment.extensions "k8-model-api" deleted
deployment.extensions "python3" deleted

This will also delete the associated ReplicaSets

$ kubectl get rs
No resources found.

and Pods

$ kubectl get pods
No resources found.

Conclusion

Congratulations for making it to the end of this post! We created a Deployment object that exposed two identical copies of trained machine learning models. These models were exposed via REST APIs within containers in separate Pods. We then created a third separate Pod and performed online inference by making a request to one of the exposed models with cURL.

In my next post, I’ll discuss how to use a Kubernetes Service to expose this model to users outside of the cluster. This can be used to deploy models that accept requests over a network connection.

If you’d like to be notified when that post is published, sign up below and I’ll send you an email as soon as it’s ready!