Reader small image

You're reading from  Machine Learning Engineering with MLflow

Product typeBook
Published inAug 2021
PublisherPackt
ISBN-139781800560796
Edition1st Edition
Tools
Right arrow
Author (1)
Natu Lauchande
Natu Lauchande
author image
Natu Lauchande

Natu Lauchande is a principal data engineer in the fintech space currently tackling problems at the intersection of machine learning, data engineering, and distributed systems. He has worked in diverse industries, including biomedical/pharma research, cloud, fintech, and e-commerce/mobile. Along the way, he had the opportunity to be granted a patent (as co-inventor) in distributed systems, publish in a top academic journal, and contribute to open source software. He has also been very active as a speaker at machine learning/tech conferences and meetups.
Read more about Natu Lauchande

Right arrow

Chapter 5: Managing Models with MLflow

In this chapter, you will learn about different features for model management in MLflow. You will learn about the model life cycle in MLflow and we will explain how to integrate it with your regular development workflow and how to create custom models not available in MLflow. A model life cycle will be introduced alongside the Model Registry feature of MLflow.

Specifically, we will look at the following sections in this chapter:

  • Understanding models in MLflow
  • Exploring model flavors in MLflow
  • Managing models and signature schemas
  • Managing the life cycle with a model registry

From a workbench perspective, we would like to use MLflow to manage our models and implement a clear model life cycle. The addition of managed model features to our benchmark leveraging MLflow will step up the quality and operations of our machine learning engineering solution.

Technical requirements

For this chapter, you will need the following:

  • The latest version of Docker installed on your machine. If you don’t already have it installed, please follow the instructions at https://docs.docker.com/get-docker/.
  • The latest version of docker-compose installed. Please follow the instructions at https://docs.docker.com/compose/install/.
  • Access to Git in the command line and installed as described at https://git-scm.com/book/en/v2/Getting-Started-Installing-Git.
  • Access to a Bash terminal (Linux or Windows).
  • Access to a browser.
  • Python 3.5+ installed.
  • The latest version of your machine learning workbench installed locally, described in Chapter 3, Your Data Science Workbench.

Understanding models in MLflow

On the MLflow platform, you have two main components available to manage models:

  • Models: This module manages the format, library, and standards enforcement module on the platform. It supports a variety of the most used machine learning models: sklearn, XGBoost, TensorFlow, H20, fastai, and others. It has features to manage output and input schemas of models and to facilitate deployment.
  • Model Registry: This module handles a model life cycle, from registering and tagging model metadata so it can be retrieved by relevant systems. It supports models in different states, for instance, live development, testing, and production.

An MLflow model is at its core a packaging format for models. The main goal of MLflow model packaging is to decouple the model type from the environment that executes the model. A good analogy of an MLflow model is that it’s a bit like a Dockerfile for a model, where you describe metadata of the model, and...

Exploring model flavors in MLflow

Model flavors in MLflow are basically the different models of different libraries supported by MLflow. This functionality allows MLflow to handle the model types with native libraries of each specific model and support some of the native functionalities of the models. The following list presents a selection of representative models to describe and illustrate the support available in MLflow:

  • mlflow.tensorflow: TensorFlow is by far one of the most used libraries, particularly geared toward deep learning. MLflow integrates natively with the model format and the monitoring abilities by saving logs in TensorBoard formats. Auto-logging is supported in MLflow for TensorFlow models. The Keras model in Figure 5.5 is a good example of TensorFlow support in MLflow.
  • mlflow.h2o: H2O is a complete machine learning platform geared toward the automation of models and with some overlapping features with MLflow. MLflow provides the ability to load (load_model...

Managing model signatures and schemas

An important feature of MLflow is to provide an abstraction for input and output schemas of models and the ability to validate model data during prediction and training.

MLflow throws an error if your input does not match the schema and signature of the model during prediction:

  1. We will next look at a code listing of a simple model of digit classification (the details of the dataset are available here: https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits). The following code flattens the image into a pandas DataFrame and fits a model to the dataset:
    from sklearn import datasets, svm, metrics
    from sklearn.model_selection import train_test_split
    import mlflow
    digits = datasets.load_digits()
    n_samples = len(digits.images)
    data = digits.images.reshape((n_samples, -1))
    clf = svm.SVC(gamma=0.001)
    X_train, X_test, y_train, y_test = train_test_split(
        data, digits.target, test_size=0.5, shuffle...

Introducing Model Registry

MLflow Model Registry is a module in MLflow that comprises a centralized store for Models, an API allowing the management of the life cycle of a model in a registry.

A typical workflow for a machine learning model developer is to acquire training data; clean, process, and train models; and from there on, hand over to a system or person that deploys the models. In very small settings, where you have one person responsible for this function, it is quite trivial. Challenges and friction start to arise when the variety and quantity of models in a team start to scale. A selection of common friction points raised by machine learning developers with regards to storing and retrieving models follows:

  • Collaboration in larger teams
  • Phasing out stale models in production
  • The provenance of a model
  • A lack of documentation for models
  • Identifying the correct version of a model
  • How to integrate the model with deployment tools

The main...

Managing the model development life cycle

Managing the model life cycle is quite important when working in a team of more than one model developer. It’s quite usual for multiple model developers to try different models within the same project, and having a reviewer decide on the model that ends up going to production is quite important:

Figure 5.13 – Example of a model development life cycle

A model in its life cycle can undergo the following stages if using a life cycle similar to the one represented in Figure 5.13:

  • Development: The state where the model developer is still exploring and trying out different approaches and is still trying to find a reasonable solution to their machine learning problem.
  • Staging: The state where the model can be tested automatically with production-type traffic.
  • Production: When the model is ready to handle real-life production traffic.
  • Archive: When the model no longer serves the business...

Summary

In this chapter, we first introduced the Models module in MLflow and the support for different algorithms, from tree-based to linear to neural. We were exposed to the support in terms of the logging and metrics of models and the creation of custom metrics.

In the last two sections, we introduced the Model Registry model and how to use it to implement a model life cycle to manage our models.

In the next chapters and section of the book, we will focus on applying the concepts learned so far in terms of real-life systems and we will architect a machine learning system for production environments.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Machine Learning Engineering with MLflow
Published in: Aug 2021Publisher: PacktISBN-13: 9781800560796
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Natu Lauchande

Natu Lauchande is a principal data engineer in the fintech space currently tackling problems at the intersection of machine learning, data engineering, and distributed systems. He has worked in diverse industries, including biomedical/pharma research, cloud, fintech, and e-commerce/mobile. Along the way, he had the opportunity to be granted a patent (as co-inventor) in distributed systems, publish in a top academic journal, and contribute to open source software. He has also been very active as a speaker at machine learning/tech conferences and meetups.
Read more about Natu Lauchande