This is post 2 in my Ultimate Guide to Deploying Machine Learning Models. You can find the other posts in the series here.
In our previous post on machine learning deployment we introduced what it means to deploy a machine learning model. We learned that in order to make the predictions from a trained model available to users and other software systems we need to consider a number of factors including how frequently predictions should be generated and whether predictions should be generated on a single sample of data or batch of samples at a time. In this post we’ll begin to examine how to implement the deployment process.
Whereas many blog posts rush directly to implementing Flask APIs or using workflow schedulers, we’re going to start at a more fundamental level. We’ll begin by discussing software interfaces, which can be thought of as the boundaries between pieces of software. An analogy is that a piece of software is a puzzle piece, and an entire software system is the completed puzzle. When properly designed, interfaces allow you to connect many different software components, leading to large and complex projects.
Let’s get started!
What’s an Interface?
Imagine a manager who assigns an employee the task of creating a report. A good manager might say: "I need you to produce a report with the following charts and figures. To produce that report, use customer transaction data." The manager has explicitly defined the desired outcome (the report) and hinted at a methodology (use of the customer transaction data).
In contrast, a bad manager might do any of the following:
- Not specify the input – Ask for the report but not specify which data to use or hint at who the employee should speak with to discover appropriate datasets.
- Not make the deliverable clear – Give the employee a bunch of data but not tell the employee what should be produced.
- Micromanage – Tell the employee what tools to use to produce the report, what steps to follow, and promise him that any deviance from this plan will be met by swift and firm punishment.
Software interfaces are like managers. A good interface explicitly states the necessary inputs and the output it produces. For example, an interface implemented as a function will list all required arguments and what’s returned by the function. Interfaces can be thought of as the "boundaries" between separate chunks of software that define how different pieces of software communicate with one another. When interfaces are constructed well, different software, even software written by developers working on separate teams or companies, can communicate and work in tandem.
Software engineers are taught to focus on the interfaces they develop rather than how the functions are implemented. The implementation is important, but can always be updated. But it’s significantly harder to update an interface after it’s released, especially if your interface is external facing. Therefore time invested defining an interface is time well spent.
A Basic Interface for Machine Learning Models
How would a software engineer think about what a machine learning model actually does? Abstractly speaking, a model accepts data, acts on that data in some way, and then returns a result. It’s really that simple. How the model is acting on the data could be incredibly involved, like a forward pass of a convolutional neural network applying convolutions to tensors of image data, but these are implementation details.
The boundary of a machine learning model is made up of the inputs to the model, i.e. the features, and the output(s) the model predicts. Therefore a well constructed interface must be built with both the input features and predicted outputs in mind. To illustrate, let’s define this interface with a simple function:
def predict(model, input_features):
'''
Function that accepts a model and input data and returns a prediction.
Args:
---
model: a machine learning model.
input_features: Features required by the model to generate a
prediction. Numpy array of shape (1, n) where n is the dimension
of the feature vector.
Returns:
--------
prediction: Prediction of the model. Numpy array of shape (1,).
'''
This function takes as its input a model
and a set of input_features
and returns a prediction. Notice that we haven’t implemented the function, i.e. we haven’t written how the function combines the model and the features to generate the prediction. We’ve simply created a contract or a promise – we guarantee the function will return a prediction
if the caller provides a model
and input_features
.
Multiple Interfaces for Machine Learning Models
The predict()
method we defined accepts a single feature vector and returns a single prediction. How do we know this? The documentation states that input_features
is a numpy array of shape (1, n)
where n
is the dimension of the feature vector. This is great if your model expects to predict a single instance at a time, but not so great if the model is also expected to predict on batches of samples. You could work around this by writing for-loops, but it’s unlikely that a loop will be very efficient. Instead, we should define another method that directly handles the batch case. Let’s call it predict_batch
:
def predict_batch(model, batch_input_features):
'''
Function that predicts a batch of samples.
Args:
---
model: a machine learning model.
batch_input_features: A batch of features required by the model to
generate predictions. Numpy array of shape (m, n) where m is the
number of instances and n is the dimension of the feature vector.
Returns:
--------
predictions: Predictions of the model. Numpy array of shape (m,).
'''
This method defines a contract whereby it promises to return a batch of predictions if a model and batch of input features are provided. Again, we haven’t implemented the method – that’s left to the developer of the method. The developer may choose to use a loop and call predict
over and over. Or the developer may do something else. This is irrelevant for the purposes of deployment. What does matter is that we have 2 interfaces: one that predicts a sample and another that predicts a batch of samples.
Machine Learning Object Oriented Programming – MLOOP
So far we’ve ignored the model
parameter required by both the predict
and predict_batch
methods. Let me explain why this is problematic for machine learning.
Most engineers developing machine learning models today want to use the best tool available. If the engineer is building a classic model, like logistic regression or random forest, the engineer might choose to use scikit-learn. But for deep learning that engineer might choose to use Tensorflow or PyTorch. Even within classical ML the engineer may opt for the xgboost implementation of gradient boosted trees. The model objects from each library have slightly different APIs. And we can’t predict what APIs future ML libraries will implement. This would make the implementations of our interfaces very messy. For instance, we DO NOT want our implementation to look like this:
def predict(model, input_features):
...
if isinstance(model, sklearn.base.BaseEstimator)
...
elif isinstance(model, xgboost.core.Booster):
...
elif isinstance(model, tensorflow.keras.Model):
...
elif isinstance(model, torch.nn.module):
...
...
This implementation would be hard to maintain and would make it difficult to debug runtime errors. Also, imagine what would happen if we wanted to pass additional parameters to predict when using one model but not another. For instance, what if we wished to pass additional parameters only predicting with an sklearn model. The number of arguments to the function would grow, but these parameters would be useless for non-sklearn models. How would we describe that in the documentation? These are just a few reasons why object-oriented programming, creating classes and objects, is preferred.
Our interface is composed of two methods: predict
and predict_batch
. Let’s define a base class with these two methods:
class Model:
def __init__(self, model):
self.model = model
def predict(self, input_features):
'''
Function that accepts input data and returns a prediction.
Args:
---
input_features: Features required by the model to generate prediction. Numpy
array of shape (1, n) where n is the dimension of the feature vector.
Returns:
--------
prediction: Prediction of the model. Numpy array of shape (1,).
'''
raise NotImplementedError
def predict_batch(self, batch_input_features):
'''
Function that predicts a batch of samples.
Args:
---
batch_input_features: A batch of features required by the model to generate
predictions. Numpy array of shape (m, n) where m is the number of
instances and n is the dimension of the feature vector.
Returns:
--------
prediction: Predictions of the model. Numpy array of shape (m, 1).
'''
raise NotImplementedError
This base class acts as a template for our data science team. If a data scientist wants to use scikit-learn models, he just needs to subclass the Model
class and implement the necessary methods. If another data scientist wants to use Tensorflow, no problem, just create a Tensorflow subclass! To illustrate, let’s create the sklearn subclass:
class SklearnModel(Model):
def __init__(self, model):
super().__init__(model)
def predict(self, input_features):
y = self.model.predict(input_features.reshape(1, -1))
return y
def predict_batch(self, batch_input_features):
ys = self.model.predict(batch_input_features)
return ys
Since sklearn Predictors
expect 2D input, we reshaped the input_features
argument in the predict
method. This is a key benefit of the object-oriented approach. We can define the interface that is relevant for the types of problems we’re solving AND take advantage of excellent 3rd party machine learning libraries!
And the benefits don’t stop there. We can add additional methods that simplify our ML workflows. For example, once a model has been trained we typically need a way to serialize the model and then deserialize it at inference time. Hence, we can add two methods, serialize()
and deserialize()
, to our interface. We can even provide default implementations of these methods in the base Model
class and create library specific implementations in the subclasses.
Additional examples of useful interface methods include moving serialized models from a local filesystem to some model store or remote filesystem like S3. There’s no limit to the methods you can add.
Conclusion
In this post we’ve defined software interfaces and created a basic ML model interface. I’ve mentioned that this interface will help us deploy our models, but I haven’t described how. In our next post, we will discuss how to use this interface to implement batch inference, which involves predicting a batch of samples. Batch inference is the go-to strategy when a batch of predictions needs to be generated on a regularly recurring schedule.
Finally, I’ve created a template for a Model
base class with all the interfaces you’ll need to deploy your models. Download it here.
Really want to see how you will go with deployment.. as we can’t use flask in production ofcourse some of tutorials mentioned combination of those can be used but still see what effective approach you will use
Thanks for your comment. Curious – why can’t you use Flask in production?
Would love to collaborate on real life example with you. I am working on scaling a forecasting model on Iowa dataset, starting from building an ml model to deploying. The part where I have been stuck mostly is the data pipeline and the scaling process.