You're reading from Practical Deep Learning at Scale with MLflow

Product typeBook

Published inJul 2022

PublisherPackt

ISBN-139781803241333

Edition1st Edition

Concepts

Deep Learning

Author (1)

Yong Liu

Chapter 3: Tracking Models, Parameters, and Metrics

Given that MLflow can support multiple scenarios through the life cycle of DL models, it is common to use MLflow's capabilities incrementally. Usually, people start with MLflow tracking since it is easy to use and can handle many scenarios for reproducibility, provenance tracking, and auditing purposes. In addition, tracking the history of a model from cradle to sunset not only goes beyond the data science experiment management domain but is also important for model governance in the enterprise, where business and regulatory risks need to be managed for using models in production. While the precise business values of tracking models in production are still evolving, the need for tracking a model's entire life cycle is unquestionable and growing. For us to be able to do this, we will begin this chapter by setting up a full-fledged local MLflow tracking server.

We will then take a deep dive into how we can track a model...

Technical requirements

The following are the requirements you will need to follow the instructions provided in this chapter:

Docker Desktop: https://docs.docker.com/get-docker/.
PyTorch lightning-flash: 0.5.0.: https://github.com/PyTorchLightning/lightning-flash/releases/tag/0.5.0.
VS Code with the Jupyter Notebook extension: https://github.com/microsoft/vscode-jupyter/wiki/Setting-Up-Run-by-Line-and-Debugging-for-Notebooks.
The following GitHub URL for the code for this chapter: https://github.com/PacktPublishing/Practical-Deep-Learning-at-Scale-with-MLflow/tree/main/chapter03.
WSL2: If you are a Microsoft Windows user, it is recommended to install WSL2 to run the Bash scripts provided in this book: https://www.windowscentral.com/how-install-wsl2-windows-10.

Setting up a full-fledged local MLflow tracking server

In Chapter 2, Getting Started with MLflow for Deep Learning, we gained hands-on experience working with a local filesystem-based MLflow tracking server and inspecting the components of the MLflow experiment. However, there are limitations with a default local filesystem-based MLflow server as the model registry functionality is not available. The benefit of having a model registry is that we can register the model, version control the model, and prepare for model deployment into production. Therefore, this model registry will bridge the gap between offline experimentation and an online deployment production scenario. Thus, we need a full-fledged MLflow tracking server with the following stores to track the complete life cycle of a model:

Backend store: A relational database backend is needed to support MLflow's storage of metadata (metrics, parameters, and many others) about the experiment. This also allows the query...

Tracking model provenance

Provenance tracking for digital artifacts has been long studied in the literature. For example, when you're using a piece of patient diagnosis data in the biomedical industry, people usually want to know where it comes from, what kind of processing and cleaning has been done to the data, who owns the data, and other history and lineage information about the data. The rise of ML/DL models for industrial and business scenarios in production makes provenance tracking a required functionality. The different granularities of provenance tracking are critical for operationalizing and managing not just the data science offline experimentation, but also before/during/after the model is deployed in production. So, what needs to be tracked for provenance?

Understanding the open provenance tracking framework

Let's look at a general provenance tracking framework to understand the big picture of why provenance tracking is a major effort. The following diagram...

Tracking model metrics

The default metric for the text classification model in the PyTorch lightning-flash package is Accuracy. If we want to change the metric to F1 score (a harmonic mean of precision and recall), which is a very common metric for measuring a classifier's performance, then we need to change the configuration of the classifier model before we start the model training process. Let's learn how to make this change and then use MLflow's non-auto-logging API to log the metrics:

When defining the classifier variable, instead of using the default metric, we will pass a metric function called torchmetrics.F1 as a variable, as follows:
```
classifier_model = TextClassifier(backbone="prajjwal1/bert-tiny", num_classes=datamodule.num_classes, metrics=torchmetrics.F1(datamodule.num_classes))
```

This uses the built-in metrics function of torchmetrics, the F1 module, along with the number of classes in the data we need to classify as a parameter. This...

Tracking model parameters

As we have already seen, there are lots of benefits of using auto-logging in MLflow, but if we want to track additional model parameters, we can either use MLflow to log additional parameters on top of what auto-logging records, or directly use MLflow to log all the parameters we want without using auto-logging at all.

Let's walk through a notebook without using MLflow auto-logging. If we want to have full control of what parameters will be logged by MLflow, we can use two APIs: mlflow.log_param and mlflow.log_params. The first one logs a single pair of key-value parameters, while the second logs an entire dictionary of key-value parameters. So, what kind of parameters might we be interested in tracking? The following answers this:

Model hyperparameters: Hyperparameters are defined before the learning process begins, which means they control how the learning process learns. These parameters can be turned and can directly affect how well...

Summary

In this chapter, we set up a local MLflow development environment that has full support for backend storage and artifact storage using MySQL and the MinIO object store. This will be very useful for us when we develop MLflow-supported DL models in this book. We started by presenting the open provenance tracking framework and asked model provenance tracking questions that are of interest. We worked on addressing the issues of auto-logging and successfully registered a trained model by loading a trained model from a logged model in MLflow for prediction using the mlflow.pytorch.load_model API. We also experimented on how to directly use MLflow's log_metrics, log_params, and log_model APIs without auto-logging, which gives us more control and flexibility over how we can log additional or customized metrics and parameters. We were able to answer many of the provenance questions by performing model provenance tracking, as well as by providing a couple of the questions that require...

MLflow Docker setup reference: https://github.com/sachua/mlflow-docker-compose
MLflow PyTorch autologging implementation: https://github.com/mlflow/mlflow/blob/master/mlflow/pytorch/_pytorch_autolog.py
MLflow PyTorch model logging, loading, and registry documentation: https://www.mlflow.org/docs/latest/python_api/mlflow.pytorch.html
MLflow parameters and metrics logging documentation: https://www.mlflow.org/docs/latest/python_api/mlflow.html
MLflow model registry documentation: https://www.mlflow.org/docs/latest/model-registry.html
Digging into big provenance (with SPADE): https://queue.acm.org/detail.cfm?id=3476885
How to utilize torchmetrics and lightning-flash: https://www.exxactcorp.com/blog/Deep-Learning/advanced-pytorch-lightning-using-torchmetrics-and-lightning-flash
Why are precision, recall, and F1 score equal when using...

The rest of the chapter is locked

You have been reading a chapter from

Practical Deep Learning at Scale with MLflow

Published in: Jul 2022Publisher: PacktISBN-13: 9781803241333

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Yong Liu

Yong Liu has been working in big data science, machine learning, and optimization since his doctoral student years at the University of Illinois at Urbana-Champaign (UIUC) and later as a senior research scientist and principal investigator at the National Center for Supercomputing Applications (NCSA), where he led data science R&D projects funded by the National Science Foundation and Microsoft Research. He then joined Microsoft and AI/ML start-ups in the industry. He has shipped ML and DL models to production and has been a speaker at the Spark/Data+AI summit and NLP summit. He has recently published peer-reviewed papers on deep learning, linked data, and knowledge-infused learning at various ACM/IEEE conferences and journals.
Read more about Yong Liu

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Practical Deep Learning at Scale with MLflow

Chapter 3: Tracking Models, Parameters, and Metrics

Technical requirements

Setting up a full-fledged local MLflow tracking server

Tracking model provenance

Understanding the open provenance tracking framework

Tracking model metrics

Tracking model parameters

Summary

Further reading

Unlock this book and the full library FREE for 7 days

Author (1)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook