Kubernetes Services for Machine Learning

In my previous Kubernetes for Machine Learning post, we used a Kubernetes Deployment to build a REST API to serve a trained machine learning model. In that setup, issuing requests to generate predictions was only possible from within our Kubernetes cluster. But what if we want to expose our models to traffic from outside the cluster?

To do this, we’ll need to learn about Kubernetes Services.

In this post, we’ll discuss Kubernetes Services and how they are useful for machine learning applications. We’ll create two Services; one for exposing a Jupyter Notebook instance and another to expose a REST API that serves a trained machine learning model.

Note: This article assumes you have access to a Kubernetes cluster and have the kubectl command line client installed.

What is a Service

Suppose you create a Kubernetes Pod running some application and you wish to connect to that app from outside the Pod. Although each Pod is assigned its own IP address, recall that Pods are ephemeral objects. When a Pod is deleted, either manually or due to a failure, that IP address no longer points to a running application.

The situation is a bit more complicated if we create a Kubernetes Deployment to manage multiple replicated Pods. The Deployment is responsible for ensuring that a specific number of Pods is running and each of these Pods has its own IP address. The Deployment can create and destroy Pods dynamically, either when scaling up or in case of failure. So the set of Pods running, along with their associated IP addresses, can change at any time. Imagine inviting friends to a party at your house and telling them that your address can change at any moment’s notice. You probably shouldn’t expect too many attendees.

So how do you reliably connect to applications running in Pods? That’s where Services come in.

A Service is an abstraction that defines a set of Pods and a policy by which to access them. Services provide a stable virtual IP (VIP) address whose purpose is to forward traffic to one or more Pods. A separate process known as the kube-proxy is responsible for keeping the mapping between the VIP and the Pods up-to-date. The set of Pods that are referenced by a Service can be specified in the Service definition by using selectors. By default, Services are only reachable from within the cluster. But a Service can be configured to be exposed onto an external IP address, allowing traffic from outside of a Kubernetes cluster to reach Pods.

How are Services Useful for Machine Learning?

Since Services provide a mechanism to access Pods, they are perfect for exposing machine learning applications. Suppose you need to expose a machine learning model to external users. You’ve trained a model and built a simple REST API that accepts requests for predictions. You deployed this REST API behind a Deployment, allowing you to easily scale up the number of Pods in order to meet increased demand. You can then create a Service that exposes the Deployment and allows traffic from within (or outside) of a cluster to access that API. Now the models are available to other software processes or directly to users to accept requests and generate predictions. This is what running machine learning in production is all about.

Interacting with Services

Let’s examine how to create and interact with Services. I will walk through two examples that are useful for machine learning practitioners. First, we will create a Service that exposes a Jupyter notebook instance. This notebook instance will be built using an image available on the Docker Hub. Next, we will create a Service that exposes the REST API we built in my post on Deployments. This example will demonstrate how to expose a model to users outside of a Kubernetes cluster.

Kubernetes Service for Jupyter Notebooks

Let’s create a Service that exposes a Jupyter Notebook instance. We first create a YAML file called jupyter_service.yaml containing the configuration for the Service.

apiVersion: v1
kind: Service
metadata:
  name: jupyter-service
spec:
  ports:
  - port: 8888
    targetPort: 8888
  selector:
    app: jupyter-deployment
  type: LoadBalancer
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: jupyter-deployment
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: jupyter-deployment
    spec:
      containers:
      - name: jupyter-container
        imagePullPolicy: Always
        image: jupyter/base-notebook:latest
        ports:
        - containerPort: 8888

Notice that this file contains the configuration for both a Service and a Deployment. When you call the Kubernetes API and and pass in this file, Kubernetes will create both resources.

Lets briefly walk through the Deployment configuration first. We use .metadata.name field to name the Deployment object jupyter-deployment. The .spec key specifies the characteristics of the ReplicaSet the Deployment will manage. Namely, that there will be 1 Pod (.spec.replicas) running a single container (.spec.template.spec.containers). This container will be built from the jupyter/base-notebook image and will expose port 8888. We have also specified a metadata key-value pair: app: jupyter-deployment.

The Service configuration contains four top-level keys. The apiVersion specifies which version of the Kubernetes API to use. The kind field specifies which type of Kubernetes resource we wish to create. In this case, we are creating a Service object. The metadata field lists a set of labels, arbitrary key-value pairs developers can attach to Kubernetes objects. The docs contain a recommended set of labels, but I would recommend appending your own machine learning specific metadata as well. The spec field specifies the characteristics you want the resource to have. Every Kubernetes resource must contain a spec field, but the format of the object spec is different for different resources (see the Kubernetes API Reference).

This specification creates a new Service object named jupyter-service which targets port 8888 on any Pod with the app=jupyter-deployment label. The .spec.type field lists the ServiceType. The LoadBalancer type exposes the service to traffic outside of the cluster.

We can create the Service with the kubectl create command:

$ kubectl create -f jupyter_service.yaml
service/jupyter-service created
deployment.apps/jupyter-deployment created

To view the running services, use the kubectl get command:

$ kubectl get services
NAME              TYPE           CLUSTER-IP    EXTERNAL-IP   PORT(S)          AGE
jupyter-service   LoadBalancer   10.98.58.16   localhost     8888:32331/TCP   24s
kubernetes        ClusterIP      10.96.0.1     <none>        443/TCP          123d

Here we see that the jupyter-service is running along with additional metadata including the ServiceType, the in-cluster IP address, and external-IP, and which port is forwarded. We can also view the created Deployment:

$ kubectl get deployments
NAME                 READY   UP-TO-DATE   AVAILABLE   AGE
jupyter-deployment   1/1     1            1           3m15s

and Pods

$ kubectl get pods
NAME                                  READY   STATUS    RESTARTS   AGE
jupyter-deployment-78c4ff6446-rqqq7   1/1     Running   0          3m30s

To access the Jupyter notebook, visit localhost:8888 in a browser window. Note that you will be prompted for a password or token. You can view this token by executing command jupyter notebook list in the running Pod. This is accomplished with the kubectl exec command:

$ kubectl exec jupyter-deployment-78c4ff6446-rqqq7 jupyter notebook list
Currently running servers:
http://0.0.0.0:8888/?token=6e86118d94c1ff3ee91a4c46f4e00b609a136d16d7483cc3 :: /home/jovyan

Replace jupyter-deployment-78c4ff6446-rqqq7 with the name of the Pod on your machine listed after running kubectl get pods.

To delete the Service and Deployment, simply run

$ kubectl delete -f jupyter_service.yaml
service "jupyter-service" deleted
deployment.apps "jupyter-deployment" deleted

Kubernetes Service for Machine Learning Model API

Let’s create another Service to expose the machine learning model API we created in my post on Deployments. If you haven’t already done so, please read that post before continuing.

We first create a YAML file called api_service.yaml containing the configuration for the Service.

apiVersion: v1
kind: Service
metadata:
  name: k8-model-api-service
spec:
  ports:
  - port: 5000
    targetPort: 5000
  selector:
    app: k8-model-api
  type: LoadBalancer
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: k8-model-api
spec:
  replicas: 2
  template:
    metadata:
      labels:
        app: k8-model-api
    spec:
      containers:
      - name: k8-model-api
        imagePullPolicy: Always
        image: lpatruno/k8-model-api:latest
        command: ["python3",  "api.py"]
        ports:
        - containerPort: 5000

This specification creates a new Service object named k8-model-api-service which targets port 5000 on any Pod with the app=k8-model-api label. We set the .spec.type field to type LoadBalancer to expose the service to traffic outside of the cluster.

We can create the Service with the kubectl create command:

$ kubectl create -f api_service.yaml
service/k8-model-api-service created
deployment.apps/k8-model-api created

To view the running services, use the kubectl get command:

$ kubectl get services
NAME                   TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
k8-model-api-service   LoadBalancer   10.106.170.28   localhost     5000:30507/TCP   15s
kubernetes             ClusterIP      10.96.0.1       <none>        443/TCP          124d

We can also view the created Deployment:

$ kubectl get deployments
NAME           READY   UP-TO-DATE   AVAILABLE   AGE
k8-model-api   2/2     2            2           51s

and Pods

$ kubectl get pods
NAME                           READY   STATUS    RESTARTS   AGE
k8-model-api-74b75b5c5-hlj74   1/1     Running   0          45s
k8-model-api-74b75b5c5-qnwdn   1/1     Running   0          45s

We can use the cURL command to call the API from outside of the cluster:

curl -i -H "Content-Type: application/json" -X POST -d '{"CRIM": 15.02, "ZN": 0.0, "INDUS": 18.1, "CHAS": 0.0, "NOX": 0.614, "RM": 5.3, "AGE": 97.3, "DIS": 2.1, "RAD": 24.0, "TAX": 666.0, "PTRATIO": 20.2, "B": 349.48, "LSTAT": 24.9}' localhost:5000/predict
HTTP/1.0 200 OK
Content-Type: application/json
Content-Length: 41
Server: Werkzeug/0.15.0 Python/3.6.8
Date: Sun, 23 Jun 2019 14:23:53 GMT

{
    "prediction": 12.273424794987879
}

We can also view the logs from the Pods to see the POST request:

$ kubectl logs -f k8-model-api-74b75b5c5-qnwdn
Loading model from: /home/jovyan/model/clf.joblib
 * Serving Flask app "api" (lazy loading)
 * Environment: production
   WARNING: Do not use the development server in a production environment.
   Use a production WSGI server instead.
 * Debug mode: on
 * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: 317-431-036
192.168.65.3 - - [23/Jun/2019 14:23:53] "POST /predict HTTP/1.1" 200 -

Note that since we requested 2 replica Pods in our Deployment definition, we have the view the logs from both Pods to determine where the request was routed.

Conclusion

In this post we’ve demonstrated how to use Kubernetes Services to expose your machine learning services. We first created a Service that served a Jupyter Notebook instance using the jupyter/base-notebook Docker image. Next, we created a Service to expose a machine learning model we trained in a previous post. The Deployment behind this service managed 2 replicated Pods that served the model, but we could easily scale this up to serve even more users.

In future posts, we will explore several advanced Kubernetes features that allow us to A/B test our live models in production. If you’d like to be notified when those posts are published, sign up below and I’ll send you an email as soon as they’re ready!