Deploying Models on AWS SageMaker - Part 1 Architecture

A few weeks ago I had the chance to speak to the AI and Data Science Fellows at the Insight Data Science program here in New York City. The fellows in the audience were finishing up their machine learning final projects. Some fellows were using Amazon SageMaker to build and deploy their models, but were a bit confused about how to perform certain tasks with the machine learning service. And I don’t blame them; SageMaker is a great tool if you’re familiar with the AWS ecosystem, but it can be downright confusing if you’ve never used AWS before.

To help them out, I decided to try and explain SageMaker as simply as I could. We broke down the different SageMaker components and walked through a tutorial of how to build and deploy a custom algorithm using the service. Since I got good feedback on the talk, I’ve decided to turn my talk into a series of blog posts. In this first part, I’ll provide a high-level architectural overview of SageMaker. Let’s start by discussing the problems SageMaker seeks to solve.

What is SageMaker?

SageMaker is Amazon’s fully-managed service for building and deploying machine learning models in production. Developers can use SageMaker to label and prepare data, choose an algorithm, train, tune and optimize models, and deploy them to production. At this point you’re probably asking: "ok, but I already do all of these things. And I do most of them on AWS. Why would I use another AWS service?"

Let’s walk through the benefits of using SageMaker (as far as I see them):

Simplicity – SageMaker is composed of a few different AWS services but provides an API that simplifies several machine learning tasks. For instance, suppose you need to train and persist a model but you first need to launch an EC2 instance with lots of RAM and CPU. Historically you would have to manually create each of the compute resources, configure these to access other services like S3, and then manually serialize the model object once training completed. SageMaker handles all of this for you. Just make an API call using the aws cli or the Python SDK, and SageMaker will launch the EC2 instances, run model training, persist the training artifacts to S3, and then shut down the EC2 instance automatically. If you decide to deploy this model for online inference, simply make another API call, and SageMaker will take care of creating the EC2 instances and the networking rules for accessing the model over the internet. That’s a lot of time saved.
Cost Savings – Have you ever left an expensive EC2 instance running for weeks at a time? It’s an efficient way of burning a hole through your wallet. SageMaker automatically shuts down your training and batch inference instances when the jobs complete so that you only pay for the time used.
Elastic Inference – Developers can enable elastic inference to automatically scale the compute instances used for online inference. This mechanism allows you to adequately respond to demand in a cost effective manner. One less thing to worry about.
Monitoring – SageMaker automatically monitors your AWS resources and tracks metrics and log outputs. You can even visualize these metrics to gain a quick understanding of how accurate your models are or how long model training takes.
Security – SageMaker relies on the AWS IAM user authorization system for authentication and permissioning. If you’re already using AWS and have IAM roles configured, you can take advantage of the security system you’ve already set up.

So you may already be using AWS for model training and inference, but SageMaker simplifies the process and provides additional benefits like centralized monitoring and security. Let’s discuss how SageMaker uses different AWS services to accomplish this.

Overall Architecture

SageMaker architecture with S3 buckets and elastic container registry

SageMaker is composed of several different AWS services. These services are "bundled" together through an API that coordinates the creation and management of different machine learning resources and artifacts. Let’s explore the AWS services that SageMaker uses and discuss how these are utilized to create the machine learning infrastructure solution.

Docker Images

Although Docker is not an AWS-specific service, an understanding of the SageMaker architecture is incomplete without a discussion of the container software. SageMaker relies extensively on Docker to execute model training and inference logic. But the level to which a user interacts with Docker will vary depending on which SageMaker features the developer uses. For instance, users of SageMaker’s built-in machine learning models do not have to interact with Docker at all. But users who intend to develop and deploy their own custom machine learning solutions on SageMaker will need to build and publish their own Docker images. For a Docker image to be compatible with SageMaker, the image needs to define several environment variables, create a particular directory structure, and contain the model training and inference logic in expected locations.

When SageMaker trains a model, it creates a number of files in the container’s /opt/ml directory

/opt/ml
├── input
│   ├── config
│   │   ├── hyperparameters.json
│   │   └── resourceConfig.json
│   └── data
│       └── <channel_name>
│           └── <input data>
├── model
│ 
├── code
│   └── <script files>
│
└── output
    └── failure

The opt/ml/input directory contains JSON files that configure the hyperparameters for the algorithm, the network layout for distributed training, and configuration specifying how to access the input data (more on that in the section on S3). The opt/ml/code directory contains the training and inference scripts and opt/ml/model contains the serialized model generated by the training algorithm. Finally, information about why a training job failed will be stored in opt/ml/output. SageMaker compresses this directory into a tar archive file and stores it on S3.

Rather than configure this all on your own, you can download the sagemaker-containers library into your Docker image. This library lets you easily create SageMaker-compatible Docker images by defining the locations for storing code and other resources when you install it.

Elastic Container Registry (ECR)

Elastic Container Registry (ECR) is Amazon’s fully-managed Docker container registry used to store, manage, and deploy container images. When a developer runs a training or inference job, SageMaker retrieves a specific Docker image from ECR and then uses this image to run containers to perform the logic. Amazon maintains a set of pre-built Docker images on ECR. Developers planning on deploying custom solutions will first need to publish their images to ECR and make these available to SageMaker.

Simple Storage Service (S3)

Simple Storage Service (S3) is Amazon’s object storage service. SageMaker utilizes S3 to store the input data and artifacts from the model training process. As described in the section on Docker images, model training jobs create a number of files in the /opt/ml directory of a running container. When the training job completes, this directory is compressed into a tar archive file and then stored on S3.

Elastic Compute (EC2)

Whenever developers create a job on SageMaker, either for training or inference, SageMaker launches EC2 instances to perform the work. Developers are free to specify the type of compute resources based on their needs (RAM/CPU/GPU/etc). For training and batch inference jobs, the SageMaker API call will take care of launching the EC2 instance, running containers from the specified Docker images, and then terminating the instances when the jobs complete. When you create an Endpoint for real time inference, SageMaker launches the number of compute instances you specify but does not terminate these since the instances need to be running to accept requests. Instead, you can configure the endpoint to elastically scale the deployed compute instances.

CloudWatch

In order to maintain the reliability, availability, and performance of your machine learning jobs, Amazon Cloudwatch monitors your AWS resources and stores log files. Cloudwatch collects raw data and processes it into readable, near real-time metrics (see here for a list of of metrics captured). To help you debug your training jobs, endpoints, transform jobs, notebook instance lifecycle configurations, anything an algorithm container, a model container, or a notebook instance lifeycle configuration sends to stdout or stderr is also sent to CloudWatch Logs. See here for a list of the logs provided by SageMaker.

IAM

As with all other AWS services, SageMaker relies on Identity and Access Management (IAM) users for authentication and access control. These credentials must have access to AWS resources such as SageMaker notebook instances, EC2 instances, and S3 buckets.

Interacting with SageMaker

Creating and managing resources in SageMaker involves making HTTP requests to the SageMaker API. This can be done in several different ways.

Notebook Instances

SageMaker offers Notebook Instances, a fully managed ML compute instance running Jupyter Notebooks. You can use the notebook to prepare and process data, write code to train models, and deploy models for inference. In order to make SageMaker API calls from the notebook, you can use the Python SDK.

Python SDK

The Python SDK is an open source library for training and deploying machine learning models on SageMaker. You can use the SDK to train models using prebuilt algorithms and Docker images as well as to deploy custom models and code. See the documentation for an overview of the major classes available in the SDK.

Note that SageMaker API calls can also be made from the boto3 library.

API and Command Line Interface

Finally, the SageMaker API is directly available from the AWS CLI.

Conclusion

In this post we’ve discussed the SageMaker architecture. We’ve looked at the AWS services that compose SageMaker and how these services are tied together. We also examined various ways to call the SageMaker API including the Python SDK, the boto3 library, and the command line interface. The purpose of this post was to provide a high level overview of SageMaker before we dive into using the service for model training and inference.

In our next few posts we’ll cover how to use SageMaker to perform model training and inference. Our examples will focus on building our own custom algorithms rather than using the built-in algorithms that SageMaker provides.

I’ve put together a short video describing how SageMaker works with other AWS services like EC2 and S3 to enable you perform model training and deployment. The video goes a bit deeper than this post to describe how the components fit together for model training, online inference, and batch inference. Download it below!

Deploying Models on AWS SageMaker – Part 1 Architecture