In my previous Kubernetes for Machine Learning post, we used a Kubernetes Deployment to build a REST API to serve a trained machine learning model. In that setup, issuing requests to generate predictions was only possible from within our Kubernetes cluster. But what if we want to expose our models to traffic from outside the cluster?
To do this, we’ll need to learn about Kubernetes Services.
In this post, we’ll discuss Kubernetes Services and how they are useful for machine learning applications. We’ll create two Services; one for exposing a Jupyter Notebook instance and another to expose a REST API that serves a trained machine learning model.
Note: This article assumes you have access to a Kubernetes cluster and have the kubectl command line client installed.
What is a Service
Suppose you create a Kubernetes Pod running some application and you wish to connect to that app from outside the Pod. Although each Pod is assigned its own IP address, recall that Pods are ephemeral objects. When a Pod is deleted, either manually or due to a failure, that IP address no longer points to a running application.
The situation is a bit more complicated if we create a Kubernetes Deployment to manage multiple replicated Pods. The Deployment is responsible for ensuring that a specific number of Pods is running and each of these Pods has its own IP address. The Deployment can create and destroy Pods dynamically, either when scaling up or in case of failure. So the set of Pods running, along with their associated IP addresses, can change at any time. Imagine inviting friends to a party at your house and telling them that your address can change at any moment’s notice. You probably shouldn’t expect too many attendees.
So how do you reliably connect to applications running in Pods? That’s where Services come in.
A Service is an abstraction that defines a set of Pods and a policy by which to access them. Services provide a stable virtual IP (VIP) address whose purpose is to forward traffic to one or more Pods. A separate process known as the kube-proxy is responsible for keeping the mapping between the VIP and the Pods up-to-date. The set of Pods that are referenced by a Service can be specified in the Service definition by using selectors. By default, Services are only reachable from within the cluster. But a Service can be configured to be exposed onto an external IP address, allowing traffic from outside of a Kubernetes cluster to reach Pods.
How are Services Useful for Machine Learning?
Since Services provide a mechanism to access Pods, they are perfect for exposing machine learning applications. Suppose you need to expose a machine learning model to external users. You’ve trained a model and built a simple REST API that accepts requests for predictions. You deployed this REST API behind a Deployment, allowing you to easily scale up the number of Pods in order to meet increased demand. You can then create a Service that exposes the Deployment and allows traffic from within (or outside) of a cluster to access that API. Now the models are available to other software processes or directly to users to accept requests and generate predictions. This is what running machine learning in production is all about.
Interacting with Services
Let’s examine how to create and interact with Services. I will walk through two examples that are useful for machine learning practitioners. First, we will create a Service that exposes a Jupyter notebook instance. This notebook instance will be built using an image available on the Docker Hub. Next, we will create a Service that exposes the REST API we built in my post on Deployments. This example will demonstrate how to expose a model to users outside of a Kubernetes cluster.
Kubernetes Service for Jupyter Notebooks
Let’s create a Service that exposes a Jupyter Notebook instance. We first create a YAML file called jupyter_service.yaml containing the configuration for the Service.
apiVersion: v1
kind: Service
metadata:
name: jupyter-service
spec:
ports:
- port: 8888
targetPort: 8888
selector:
app: jupyter-deployment
type: LoadBalancer
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: jupyter-deployment
spec:
replicas: 1
template:
metadata:
labels:
app: jupyter-deployment
spec:
containers:
- name: jupyter-container
imagePullPolicy: Always
image: jupyter/base-notebook:latest
ports:
- containerPort: 8888
Notice that this file contains the configuration for both a Service and a Deployment. When you call the Kubernetes API and and pass in this file, Kubernetes will create both resources.
Lets briefly walk through the Deployment configuration first. We use .metadata.name
field to name the Deployment object jupyter-deployment. The .spec
key specifies the characteristics of the ReplicaSet the Deployment will manage. Namely, that there will be 1 Pod (.spec.replicas
) running a single container (.spec.template.spec.containers
). This container will be built from the jupyter/base-notebook image and will expose port 8888
. We have also specified a metadata key-value pair: app: jupyter-deployment
.
The Service configuration contains four top-level keys. The apiVersion specifies which version of the Kubernetes API to use. The kind field specifies which type of Kubernetes resource we wish to create. In this case, we are creating a Service object. The metadata field lists a set of labels, arbitrary key-value pairs developers can attach to Kubernetes objects. The docs contain a recommended set of labels, but I would recommend appending your own machine learning specific metadata as well. The spec field specifies the characteristics you want the resource to have. Every Kubernetes resource must contain a spec field, but the format of the object spec is different for different resources (see the Kubernetes API Reference).
This specification creates a new Service object named jupyter-service which targets port 8888 on any Pod with the app=jupyter-deployment
label. The .spec.type
field lists the ServiceType. The LoadBalancer
type exposes the service to traffic outside of the cluster.
We can create the Service with the kubectl create
command:
$ kubectl create -f jupyter_service.yaml
service/jupyter-service created
deployment.apps/jupyter-deployment created
To view the running services, use the kubectl get
command:
$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
jupyter-service LoadBalancer 10.98.58.16 localhost 8888:32331/TCP 24s
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 123d
Here we see that the jupyter-service is running along with additional metadata including the ServiceType, the in-cluster IP address, and external-IP, and which port is forwarded. We can also view the created Deployment:
$ kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
jupyter-deployment 1/1 1 1 3m15s
and Pods
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
jupyter-deployment-78c4ff6446-rqqq7 1/1 Running 0 3m30s
To access the Jupyter notebook, visit localhost:8888
in a browser window. Note that you will be prompted for a password or token. You can view this token by executing command jupyter notebook list
in the running Pod. This is accomplished with the kubectl exec
command:
$ kubectl exec jupyter-deployment-78c4ff6446-rqqq7 jupyter notebook list
Currently running servers:
http://0.0.0.0:8888/?token=6e86118d94c1ff3ee91a4c46f4e00b609a136d16d7483cc3 :: /home/jovyan
Replace jupyter-deployment-78c4ff6446-rqqq7
with the name of the Pod on your machine listed after running kubectl get pods
.
To delete the Service and Deployment, simply run
$ kubectl delete -f jupyter_service.yaml
service "jupyter-service" deleted
deployment.apps "jupyter-deployment" deleted
Kubernetes Service for Machine Learning Model API
Let’s create another Service to expose the machine learning model API we created in my post on Deployments. If you haven’t already done so, please read that post before continuing.
We first create a YAML file called api_service.yaml containing the configuration for the Service.
apiVersion: v1
kind: Service
metadata:
name: k8-model-api-service
spec:
ports:
- port: 5000
targetPort: 5000
selector:
app: k8-model-api
type: LoadBalancer
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: k8-model-api
spec:
replicas: 2
template:
metadata:
labels:
app: k8-model-api
spec:
containers:
- name: k8-model-api
imagePullPolicy: Always
image: lpatruno/k8-model-api:latest
command: ["python3", "api.py"]
ports:
- containerPort: 5000
This specification creates a new Service object named k8-model-api-service which targets port 5000 on any Pod with the app=k8-model-api
label. We set the .spec.type
field to type LoadBalancer
to expose the service to traffic outside of the cluster.
We can create the Service with the kubectl create
command:
$ kubectl create -f api_service.yaml
service/k8-model-api-service created
deployment.apps/k8-model-api created
To view the running services, use the kubectl get
command:
$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
k8-model-api-service LoadBalancer 10.106.170.28 localhost 5000:30507/TCP 15s
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 124d
We can also view the created Deployment:
$ kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
k8-model-api 2/2 2 2 51s
and Pods
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
k8-model-api-74b75b5c5-hlj74 1/1 Running 0 45s
k8-model-api-74b75b5c5-qnwdn 1/1 Running 0 45s
We can use the cURL
command to call the API from outside of the cluster:
curl -i -H "Content-Type: application/json" -X POST -d '{"CRIM": 15.02, "ZN": 0.0, "INDUS": 18.1, "CHAS": 0.0, "NOX": 0.614, "RM": 5.3, "AGE": 97.3, "DIS": 2.1, "RAD": 24.0, "TAX": 666.0, "PTRATIO": 20.2, "B": 349.48, "LSTAT": 24.9}' localhost:5000/predict
HTTP/1.0 200 OK
Content-Type: application/json
Content-Length: 41
Server: Werkzeug/0.15.0 Python/3.6.8
Date: Sun, 23 Jun 2019 14:23:53 GMT
{
"prediction": 12.273424794987879
}
We can also view the logs from the Pods to see the POST request:
$ kubectl logs -f k8-model-api-74b75b5c5-qnwdn
Loading model from: /home/jovyan/model/clf.joblib
* Serving Flask app "api" (lazy loading)
* Environment: production
WARNING: Do not use the development server in a production environment.
Use a production WSGI server instead.
* Debug mode: on
* Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
* Restarting with stat
* Debugger is active!
* Debugger PIN: 317-431-036
192.168.65.3 - - [23/Jun/2019 14:23:53] "POST /predict HTTP/1.1" 200 -
Note that since we requested 2 replica Pods in our Deployment definition, we have the view the logs from both Pods to determine where the request was routed.
Conclusion
In this post we’ve demonstrated how to use Kubernetes Services to expose your machine learning services. We first created a Service that served a Jupyter Notebook instance using the jupyter/base-notebook
Docker image. Next, we created a Service to expose a machine learning model we trained in a previous post. The Deployment behind this service managed 2 replicated Pods that served the model, but we could easily scale this up to serve even more users.
In future posts, we will explore several advanced Kubernetes features that allow us to A/B test our live models in production. If you’d like to be notified when those posts are published, sign up below and I’ll send you an email as soon as they’re ready!