You're reading from Machine Learning Engineering with MLflow

Product typeBook

Published inAug 2021

PublisherPackt

ISBN-139781800560796

Edition1st Edition

Tools

Maven

Concepts

Machine Learning

Author (1)

Natu Lauchande

Chapter 4: Experiment Management in MLflow

In this chapter, we will give you practical experience with stock predictions by creating different models and comparing metrics of different runs in MLflow. You will be guided in terms of how to use the MLflow experiment method so that different machine learning practitioners can share metrics and improve on the same model.

Specifically, we will look at the following topics in this chapter:

Getting started with the experiments module
Defining the experiment
Adding experiments
Comparing different models
Tuning your model with hyperparameter optimization

At this stage, we currently have a baseline pipeline that acts based on a naïve heuristic. In this chapter, we will add to our set of skills the ability to experiment with multiple models and tune one specific model using MLflow.

We will be delving into our Psystock company use case of a stock trading machine learning platform introduced in...

Technical requirements

For this chapter, you will need the following prerequisites:

The latest version of Docker installed on your machine. If you don't already have it installed, please follow the instructions at https://docs.docker.com/get-docker/.
The latest version of Docker Compose installed. Please follow the instructions at https://docs.docker.com/compose/install/.
Access to Git in the command line and installed as described in https://git-scm.com/book/en/v2/Getting-Started-Installing-Git.
Access to a bash terminal (Linux or Windows).
Access to a browser.
Python 3.5+ installed.
The latest version of your machine learning installed locally and described in Chapter 3, Your Data Science Workbench.

Getting started with the experiments module

To get started with the technical modules, you will need to get started with the environment prepared for this chapter in the following folder: https://github.com/PacktPublishing/Machine-Learning-Engineering-with-MLflow/tree/master/Chapter04

You should be able, at this stage, to execute the make command to build up your workbench with the dependencies needed to follow along with this chapter. You need next to type the following command to move to the right directory:

$ cd Chapter04/gradflow/

To start the environment, you need to run the following command:

$ make

The entry point to start managing experimentation in MLflow is the experiments interface illustrated in Figure 4.1:

Figure 4.1 – The Experiments interface in MLflow

On the left pane (1), you can manage and create experiments, and on the right (2), you can query details of a specific...

Defining the experiment

Using the machine learning problem framing methodology, we will now define the main components of our stock price prediction problem as defined for the chapter:

Table 4.1 – Machine learning problem framing recap

The F-score metric in machine learning is a measure of accuracy for binary classifiers and provides a good balance and trade-off between misclassifications (false positives or false negatives). Further details can be found on the Wikipedia page: https://en.wikipedia.org/wiki/F-score.

Exploring the dataset

As specified in our machine learning problem framing, we will use as input data the market observations for the period January-December 2020, as provided by the Yahoo data API.

The following code excerpt, which uses the pandas_datareader module available in our workbench, allows us to easily retrieve the data that we want. The complete working notebook is available at https://github.com...

Adding experiments

So, in this section, we will use the experiments module in MLflow to track the different runs of different models and post them in our workbench database so that the performance results can be compared side by side.

The experiments can actually be done by different model developers as long as they are all pointing to a shared MLflow infrastructure.

To create our first, we will pick a set of model families and evaluate our problem on each of the cases. In broader terms, the major families for classification can be tree-based models, linear models, and neural networks. By looking at the metric that performs better on each of the cases, we can then direct tuning to the best model and use it as our initial model in production.

Our choice for this section includes the following:

Logistic Classifier: Part of the family of linear-based models and a commonly used baseline.
Xgboost: This belongs to the family of tree boosting algorithms where many weak...

Comparing different models

We have run the experiments in this section for each of the models covered and verified all the different artifacts. Just by looking at our baseline experiment table, and by selecting the common custom metric, f1_experiment_score, we can see that the best performing model is the logistic regression-based model, with an F-score of 0.66:

Figure 4.10 – Comparing different model performance in terms of the goal metric

Metrics can also be compared side by side, as shown in the excerpt in Figure 4.11. On the left side, we have the SKlearn model, and on the right the XGBoost model, with the custom metrics of f1_experiment_score. We can see that the metrics provided by both are different and, hence, the reason for custom metrics when we have different models:

Figure 4.11 – Metrics of the Sklearn model

After comparing the metrics, it becomes clear that the best model is logistic regression. To improve...

Tuning your model with hyperparameter optimization

Machine learning models have many parameters that allow the developer to improve performance and control the model that they are using, providing leverage to better fit the data and production use cases. Hyperparameter optimization is the systematic and automated process of identifying the optimal parameters for your machine learning model and is critical for the successful deployment of such a system.

In the previous section, we identified the best family (in other words, LogisticRegression) model for our problem, so now it's time to identify the right parameters for our model with MLflow. You can follow along in the following notebook in the project repository, Chapter04/gradflow/notebooks/hyperopt_optimization_logistic_regression_mlflow.ipynb:

Importing dependencies: We will use the hyperopt library, which contains multiple algorithms to help us carry out model tuning:
```
from hyperopt import tpe
from hyperopt import...
```

Summary

In this chapter, we introduced the experiments component of MLflow. We got to understand the logging metrics and artifacts in MLflow. We detailed the steps to track experiments in MLflow.

In the final sections, we explored the use case of hyperparameter optimization using the concepts learned in the chapter.

In the next chapter, we will focus on managing models with MLflow using the models developed in this chapter.

Natu Lauchande is a principal data engineer in the fintech space currently tackling problems at the intersection of machine learning, data engineering, and distributed systems. He has worked in diverse industries, including biomedical/pharma research, cloud, fintech, and e-commerce/mobile. Along the way, he had the opportunity to be granted a patent (as co-inventor) in distributed systems, publish in a top academic journal, and contribute to open source software. He has also been very active as a speaker at machine learning/tech conferences and meetups.
Read more about Natu Lauchande

Other recommended products

Related to this chapter

Distributed Data Systems with Azure Databricks

This book helps you to learn how to extract, transform, and orchestrate massive amounts of data to develop robust data pipelines. You'll perform complex machine learning tasks using advanced Azure Databricks features, and also explore model tuning, deployment, and control using Databricks functionalities such as AutoML and Delta Lake with TensorFlow.

BookMay 2021414 pages

Automated Machine Learning

This guide will help you to explore automated machine learning (AutoML), a rapidly growing subfield of machine learning. You’ll learn how you can use AutoML to fully automate the machine learning process even if you’re not an expert, and in turn increase your productivity drastically.

BookFeb 2021312 pages

Engineering MLOps

Get to grips with ML lifecycle management and MLOps implementation for your organization. This book will give you comprehensive insights into MLOps coupled with real-world examples in Azure that will teach you how to write programs, train robust and scalable ML models, and build ML pipelines to train, deploy, and monitor models securely in production.

BookApr 2021370 pages

Amazon SageMaker Best Practices

Going beyond the basics, Amazon SageMaker Best Practices provides end-to-end coverage of the service capabilities that the platform offers for building and automating machine learning workloads to address data science challenges. With this book, you'll discover tips to train, deploy, and monitor your machine learning solutions efficiently.

BookSep 2021348 pages

Learn Amazon SageMaker

This book will teach you how to move quickly from business questions to machine learning models in production. Using real-world examples implemented with Python and Jupyter notebooks, you’ll learn about many the features and APIs of Amazon SageMaker on a wide spectrum of use cases: tabular data, computer vision, and natural language processing.

BookAug 2020490 pages

Python Data Science Essentials

Python Data Science Essentials, Third Edition provides modern insight in setting up and performing data science operations effectively using the latest python tools and libraries. It builds faster governance on the most essential tasks such as data munging and pre-processing, along with all the techniques you require.

BookSep 2018472 pages

Mastering Azure Machine Learning

This book will help you learn how to build a scalable end-to-end machine learning pipeline in Azure from experimentation and training to optimization and deployment. By the end of this book, you will learn to build complex distributed systems and scalable cloud infrastructure using powerful machine learning algorithms to compute insights.

BookApr 2020436 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Machine Learning Engineering with MLflow

Chapter 4: Experiment Management in MLflow

Technical requirements

Getting started with the experiments module

Defining the experiment

Exploring the dataset

Adding experiments

Comparing different models

Tuning your model with hyperparameter optimization

Summary

Further reading

Unlock this book and the full library FREE for 7 days

Author (1)

Distributed Data Systems with Azure Databricks

Automated Machine Learning

This guide will help you to explore automated machine learning (AutoML), a rapidly growing subfield of machine learning. You’ll learn how you can use AutoML to fully automate the machine learning process even if you’re not an expert, and in turn increase your productivity drastically.

Engineering MLOps

Amazon SageMaker Best Practices

Learn Amazon SageMaker

Python Data Science Essentials

Mastering Azure Machine Learning

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook