In my previous post we discussed how to leverage Kubernetes Jobs to perform common production machine learning tasks such as model training and batch inference. Jobs allow us to reliably run batch processes in a fault tolerant way. Even if an underlying node in the cluster fails, Kubernetes will ensure that the Job is rescheduled on a new node.
One limitation of using Jobs for machine learning workloads is that Job objects needs to be created manually. What if we want Jobs to run at specific times? Or what if we want to run some machine learning Jobs periodically on a recurring schedule? In this case, Kubernetes offers us the CronJob.
What is a CronJob
A CronJob creates Jobs on a time-based schedule similar to cron tasks on a Linux system. CronJobs are useful when you wish to create recurring tasks or run jobs at specific times. For example, you may wish to run recurring batch processes during periods of low activity. It’s important to note that CronJobs do not create Pods directly. Instead, a CronJob is only responsible for creating Jobs based on its schedule. The created Jobs are responsible for managing Pods that perform application logic.
How are CronJob Useful for Machine Learning
CronJobs are quite useful in machine learning workflows. Suppose you’re building a feature store and need to generate features every hour from an oeprational data store. One way of producing these features is to use an hourly CronJob that reads from the data store, creates the features, and stores these in the feature store. As another example, consider a lead scoring model that performs batch inference each night. This can be deployed as a daily CronJob that loads a pretrained model, fetches new input data, performs inference, and persists the predictions.
Interacting with CronJobs
In this post we will be using the lpatruno/k8-model Docker Image and inference.py python script from my previous post on Kubernetes Jobs. If you haven’t already I recommend reading that post before continuing. Using those files we will create a CronJob that loads a pretrained model and beforms batch inference on a recurring schedule. This is a common production machine learning pattern.
Note: This section assumes you have access to a Kubernetes cluster and have the kubectl command line client installed.
Creating a CronJob
To create a CronJob, we need to create a yaml file containing the configuration data. Let’s walk through inference.yaml, the config file for our batch inference CronJob:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: inference-cronjob
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: inference-container
imagePullPolicy: Always
image: lpatruno/k8-model:latest
command: ["python3", "inference.py"]
env:
- name: AWS_ACCESS_KEY_ID
value: ""
- name: AWS_SECRET_ACCESS_KEY
value: ""
restartPolicy: Never
backoffLimit: 0
This yaml file contains four top-level keys. The apiVersion specifies which version of the Kubernetes API to use. The kind field specifies which type of Kubernetes resource we wish to create. In this case, we are creating a CronJob object. The metadata field lists a set of labels, arbitrary attributes developers can attach to Kubernetes objects. The docs contain a recommended set of labels, but I would recommend appending your own machine learning specific metadata as well. The spec field specifies the characteristics you want the resource to have. Every Kubernetes resource must contain a spec field, but the format of the object spec is different for different objects (see the Kubernetes API Reference).
The .spec
field for the CronJob resource above contains two fields. The .spec.schedule
field contains the cron formatted string that specifies when the CronJob should run. In my example, a new Job resource will be created each day at noon. The .spec.jobTemplate
field contains the same fields that would appear in a Job spec field. In fact, I simply used the .spec
field from my previous post.
To create the Job, simply run
$ kubectl create -f inference.yaml
cronjob.batch/inference-cronjob created
Viewing CronJobs
We can view the scheduled CronJobs by running the kubectl get cronjobs
command:
$ kubectl get cronjobs
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
inference-cronjob * 12 * * * False 0 <none> 29s
Since CronJobs create Job resources, we can monitor the Jobs that are created are look for instance of inference-cronjob:
$ kubectl get jobs --watch
NAME COMPLETIONS DURATION AGE
inference-cronjob-1558532220 0/1 0s
inference-cronjob-1558532220 0/1 0s 0s
inference-cronjob-1558532220 1/1 7s 7s
The --watch
flag in the command above watches for any changes in the Jobs resources. Our CronJob has created a Job object named inference-cronjob-1558532220.
In order to the view the logs from a Job created by a CronJob, we need to retrieve the Pod resource associated with that Job. To do that, we can run the kubectl get pod
command and specify the name of the Job object.
$ kubectl get pods --selector=job-name=inference-cronjob-1558532220
NAME READY STATUS RESTARTS AGE
inference-cronjob-1558532220-ddqsr 0/1 Completed 0 22s
We see that the Pod has comlpleted successfuly. With the name of the Pod in hand, we can view the logs from that Pod:
kubectl logs inference-cronjob-1558532220-ddqsr
Running inference...
Loading data...
Loading model from: /home/jovyan/model/clf.joblib
Scoring observations...
[ 15.32448686 27.68741572 24.17609927 31.94786177 10.40786467
34.38871141 22.05210667 11.58265489 13.21049075 42.87157933
33.03218733 15.77635169 23.93521876 19.79260258 25.43466604
20.55132127 13.67733317 47.48979635 17.70069362 21.51806638
22.57388848 16.97645106 16.25503893 20.57862843 14.57438158
11.81385445 24.78353556 37.64333157 30.29062179 19.67713185
23.19310437 25.06569372 18.65459129 30.26701253 8.97905481
13.8130382 14.21123728 17.3840622 19.83840166 23.83861108
20.44820805 15.32433651 25.8157052 16.47533793 19.2214524
19.86928427 21.47113681 21.56443118 24.64517965 22.43665872
22.25160648]
Success! Our batch inference is complete.
You can delete a CronJob with the kubectl delete
command. This will also delete any Jobs and Pods created by the CronJob.
kubectl delete -f inference.yaml
cronjob.batch "inference-cronjob" deleted
Conclusion
Let’s briefly review our work. First, we created a CronJob configuration file called inference.yaml. This config specifies that the inference.py script should be run each day at noon. At that time, the CronJob is triggered and Kuberntes creates a Job object. This Job then creates a Pod object, which runs a container that runs the Python script. We can use the same pattern above to schedule recurring model training jobs. I’ll leave that as an exercise for the reader : )
So far in our Kubernetes series we’ve covered how to create Pod, Job, and CronJob resources. Jobs and CronJobs are great for running batch processes. But what if we need to perform online inference? In that case we’ll need to deploy an API that accepts incoming requests, performs inference, and returns the result. In our next post I’ll demonstrate how to use Kubernetes Deployments to deploy online inference.
If you’d like to be notified when that post is published, sign up below and I’ll send you an email as soon as it’s ready!