Tracking Machine Learning Metadata with Sacred Library

A few weeks ago I wrote about why storing metadata is critical for the machine learning process. Since building machine learning models is an iterative process, often involving multiple people and a diverse set of tools, we need the ability to compare results across multiple experiments over time. We also need the ability to reproduce our analyses, especially when it comes time to productionizing our models.

In that post I also covered what types of metadata and artifacts to store. We want to capture references to training and test data sets, the types of models we use, hyperparameter values, and feature preprocessing steps. But we also need to store metrics and additional context such source code and dependency information.

Today I’d like to show you how to capture this metadata. I’ll present a simple example of how to augment a typical machine learning task with code that automatically configures, extracts, and serializes metadata. We’ll examine the source code and the artifacts generated through the process. This will provide a template that you can adapt to all of your machine learning metadata needs.

Sacred library

Sacred is a Python library that helps researchers and developers configure, organize, log, and reproduce experiments. It keeps track of all the parameters of your experiment, captures important metadata and artifacts, and stores it all in a number of different formats. Sacred provides an easy-to-use, extensible command-line interface and a basic API that makes it trivial to extract and store important metadata so that data scientists and machine learning engineers can stay focused on building and deploying models. The library is also incredibly well documented.

Storing Machine Learning Metadata with Sacred

In order to demonstrate how to store metadata with Sacred, let’s examine a Python script that performs a common machine learning task. The script will load a dataset, split the data into training and test sets, fit a model, and compute evaluation metrics on the training and test sets. We’ll also serialize the model so it can be used for inference at a later time. If you’ve been following my posts, you’ll notice I used a similar script in my Docker for machine learning series.

Writing an Experiment using Sacred

Let’s walk through this script. In lines 1-11 we import necessary libraries. In line 13, I instantiate a Sacred Experiment object and give it the name "train_experiment". Experiment is the main class of the Sacred framework and manages the configuration, main function, and all other internals responsible for capturing and storing experimental metadata. The class exposes a number of methods and decorators that we use to store configuration, metrics, and experiment artifacts.

In lines 15-23 we define a function config and decorate it with @ex.config. In Sacred this is known as a Config Scope and is executed just before running the experiment. All local variables defined within the function that are JSON serializable are stored as configuration entries of the experiment and are made available to certain function within the script. Keep in mind that functions used as config scopes should not contain any return or yield statements.

In lines 25-55 we define the main function of the script train_model and decorate it with @ex.automain. The function decorated with ex.automain is the main function of the experiment. Sacred provides a command-line interface for each experiment and will automatically run the decorated function if you execute the file. We define the train_model method to accept the parameters MODEL_DIR, MODEL_FILE, MODEL_PATH, and params. These are the same variables we defined locally within the config method. When the experiment is run, Sacred automatically fills in the missing parameters of the train_model method with the configuration values. Apart from the main function which is decorated with ex.automain, Sacred will automatically inject configuration values for captured functions i.e. all functions decorated with @ex.capture.

In lines 28-44 we load the boston housing data, split the data into training and testing sets, and fit a Gradient Boosting Regressor using the hyperparametes defined in the configuration. We then compute the evaluation metrics on the train/test sets and serialize the model in lines 46-50. In lines 52-55 we use the Sacred API to capture the evaluation metrics and the serialized model artifact. The log_scalar method take a metric name and a value and stores both. The add_artifact method accepts a filename and adds the file as an Artifact.

Running a Sacred Experiment

Sacred provides a command-line interface for running experiments. For example, we can use the print_config command to view the configuration of the experiment.

$ python3 train.py print_config
INFO - train_experiment - Running command 'print_config'
INFO - train_experiment - Started
INFO - train_experiment - Completed after 0:00:00
Configuration (modified, added, typechanged, doc):
  MODEL_DIR = '/home/jovyan/model'
  MODEL_FILE = 'clf.joblib'
  MODEL_PATH = '/home/jovyan/model/clf.joblib'    # Path to persist serialized model object
  seed = 950345841                   # the random seed for this experiment
  params:                            # Hyperparmater settings for GradientBoostedRegressor
    learning_rate = 0.01
    loss = 'ls'
    max_depth = 4
    min_samples_split = 2
    n_estimators = 500

There are a few things to observe here. First, Sacred pretty prints the configuration values and all nested dictionaries with indentation. Sacred also picks up the inline comments used in the config which can be used to improve user-friendliness of the script. To ensure reproducability of results, Sacred automatically generates a random seed and sets the global seed of random and numpy.random to that value.

To view the versioned dependencies of the experiment, we can use the print_dependencies command:

$ python3 train.py print_dependencies
INFO - train_experiment - Running command 'print_dependencies'
INFO - train_experiment - Started
INFO - train_experiment - Completed after 0:00:00
Dependencies:
  joblib               == 0.13.2
  matplotlib           == 2.2.3
  numpy                == 1.13.3
  sacred               == 0.7.4
  scikit-learn         == 0.20.2

Sources:
  train.py                                     226c0c1cf4529ab399a4ffba04ad04a2

Sacred allows users to create custom commands and also to update configuration from the command line.

In order to run the experiment, we just need to execute the script:

$ python3 train.py
WARNING - train_experiment - No observers have been added to this run
INFO - train_experiment - Running command 'train_model'
INFO - train_experiment - Started
INFO - train_experiment - Completed after 0:00:00
Loading data...
Splitting data...
Fitting model...
Serializing model to: /home/jovyan/model/clf.joblib

However, the real magic doesn’t happen unless we attach an Observer. Observers allow us to persist the experimental metadata that is generated so that we can analyse our results and reproduce them if needed. Sacred ships with observers that store info in MongoDB, tinydb, SQL databases, and the local filesystem. For the purposes of our tutorial, we’ll persist our results to the local filesystem. To do that, we just have to add the command-line argument --file_storage=BASEDIR. Here is the output of that command:

$ python3 train.py --file_storage=/home/jovyan/experiments
INFO - train_experiment - Running command 'train_model'
INFO - train_experiment - Started run with ID "1"
INFO - train_experiment - Completed after 0:00:00
Loading data...
Splitting data...
Fitting model...
Serializing model to: /home/jovyan/model/clf.joblib

Sacred Experiment Outputs

Running the python3 train.py --file_storage=/home/jovyan/experiments command results in the creation of an experiments/ directory containing the extracted metadata. Here is the generated directory structure

experiments/
|--- 1
    |--- clf.joblib
    |--- config.json
    |--- cout.txt
    |--- metrics.json
    |--- run.json
|--- _sources
    |--- train_226c0c1cf4529ab399a4ffba04ad04a2.py

Let’s walk through each of these files.

The _sources/ subdirectory simply contains a copy of the source files of the experiment. By doing this, the version of the code used for running the experiment is always available with the run. Hooray for reproducibility.

1/ contains the artifacts from the experiment, including metrics, configuration, and outputs of the run. Note that the 1/ corresponds the first run of the script. Subsequent runs would be stored in 2/, 3/, etc.

1/clf.joblib is the serialized model file. This file was generated by ex.add_artifact(MODEL_PATH) call in train.py.

1/config.json contains the experiment configuration:

$ cat config.json 
{
  "MODEL_DIR": "/home/jovyan/model",
  "MODEL_FILE": "clf.joblib",
  "MODEL_PATH": "/home/jovyan/model/clf.joblib",
  "params": {
    "learning_rate": 0.01,
    "loss": "ls",
    "max_depth": 4,
    "min_samples_split": 2,
    "n_estimators": 500
  },
  "seed": 234670872

1/cout.txt contains the printed output of the experiment:

$ cat cout.txt 
INFO - train_experiment - Running command 'train_model'
INFO - train_experiment - Started run with ID "1"
Loading data...
Splitting data...
Fitting model...
Serializing model to: /home/jovyan/model/clf.joblib
INFO - train_experiment - Completed after 0:00:00

1/metrics.json contains the metrics we captured with ex.log_scalar:

$ cat metrics.json 
{
  "testing.mean_square_error": {
    "steps": [
      0
    ],
    "timestamps": [
      "2019-04-20T18:11:47.381008"
    ],
    "values": [
      6.430425857111316
    ]
  },
  "training.mean_square_error": {
    "steps": [
      0
    ],
    "timestamps": [
      "2019-04-20T18:11:47.380960"
    ],
    "values": [
      1.7677391462344387
    ]
  }
}

Finally, 1/run.json contains all other captured metadata including dependencies, host information, and status information.

$ cat run.json 
{
  "artifacts": [
    "clf.joblib"
  ],
  "command": "train_model",
  "experiment": {
    "base_dir": "/home/jovyan",
    "dependencies": [
      "joblib==0.13.2",
      "matplotlib==2.2.3",
      "numpy==1.13.3",
      "sacred==0.7.4",
      "scikit-learn==0.20.2"
    ],
    "mainfile": "train.py",
    "name": "train_experiment",
    "repositories": [],
    "sources": [
      [
        "train.py",
        "_sources/train_226c0c1cf4529ab399a4ffba04ad04a2.py"
      ]
    ]
  },
  "heartbeat": "2019-04-20T18:11:47.385457",
  "host": {
    "ENV": {},
    "cpu": "Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz",
    "hostname": "346552d26f09",
    "os": [
      "Linux",
      "Linux-4.9.125-linuxkit-x86_64-with-debian-buster-sid"
    ],
    "python_version": "3.6.8"
  },
  "meta": {
    "command": "train_model",
    "options": {
      "--beat_interval": null,
      "--capture": null,
      "--comment": null,
      "--debug": false,
      "--enforce_clean": false,
      "--file_storage": "/home/jovyan/experiments",
      "--force": false,
      "--help": false,
      "--loglevel": null,
      "--mongo_db": null,
      "--name": null,
      "--pdb": false,
      "--print_config": false,
      "--priority": null,
      "--queue": false,
      "--sql": null,
      "--tiny_db": null,
      "--unobserved": false,
      "COMMAND": null,
      "UPDATE": [],
      "help": false,
      "with": false
    }
  },
  "resources": [],
  "result": null,
  "start_time": "2019-04-20T18:11:47.050053",
  "status": "COMPLETED",
  "stop_time": "2019-04-20T18:11:47.383521"
}

Conclusion

Sacred is a powerful and simple tool that allows data scientists to configure, organize, log, and reproduce experiments. We looked at a simple example of using Sacred to store metadata and artifacts during a machine learning training job. While we only ran a single experiment, you could use the library to compare the results of multiple experiments by leveraging its metrics API. You could take this a step further by building visualizations on top of the generated output files.

Although we used Sacred in this post, there are other libraries that do similar things. Check out Flor as another example.

If you found this tutorial helpful, please share it on LinkedIn, Twitter, or Facebook!