Reader small image

You're reading from  Machine Learning Engineering with MLflow

Product typeBook
Published inAug 2021
PublisherPackt
ISBN-139781800560796
Edition1st Edition
Tools
Right arrow
Author (1)
Natu Lauchande
Natu Lauchande
author image
Natu Lauchande

Natu Lauchande is a principal data engineer in the fintech space currently tackling problems at the intersection of machine learning, data engineering, and distributed systems. He has worked in diverse industries, including biomedical/pharma research, cloud, fintech, and e-commerce/mobile. Along the way, he had the opportunity to be granted a patent (as co-inventor) in distributed systems, publish in a top academic journal, and contribute to open source software. He has also been very active as a speaker at machine learning/tech conferences and meetups.
Read more about Natu Lauchande

Right arrow

Chapter 4: Experiment Management in MLflow

In this chapter, we will give you practical experience with stock predictions by creating different models and comparing metrics of different runs in MLflow. You will be guided in terms of how to use the MLflow experiment method so that different machine learning practitioners can share metrics and improve on the same model.

Specifically, we will look at the following topics in this chapter:

  • Getting started with the experiments module
  • Defining the experiment
  • Adding experiments
  • Comparing different models
  • Tuning your model with hyperparameter optimization

At this stage, we currently have a baseline pipeline that acts based on a naïve heuristic. In this chapter, we will add to our set of skills the ability to experiment with multiple models and tune one specific model using MLflow.

We will be delving into our Psystock company use case of a stock trading machine learning platform introduced in...

Technical requirements

For this chapter, you will need the following prerequisites:

Getting started with the experiments module

To get started with the technical modules, you will need to get started with the environment prepared for this chapter in the following folder: https://github.com/PacktPublishing/Machine-Learning-Engineering-with-MLflow/tree/master/Chapter04

You should be able, at this stage, to execute the make command to build up your workbench with the dependencies needed to follow along with this chapter. You need next to type the following command to move to the right directory:

$ cd Chapter04/gradflow/

To start the environment, you need to run the following command:

$ make

The entry point to start managing experimentation in MLflow is the experiments interface illustrated in Figure 4.1:

2

1

Figure 4.1 – The Experiments interface in MLflow

On the left pane (1), you can manage and create experiments, and on the right (2), you can query details of a specific...

Defining the experiment

Using the machine learning problem framing methodology, we will now define the main components of our stock price prediction problem as defined for the chapter:

Table 4.1 – Machine learning problem framing recap

The F-score metric in machine learning is a measure of accuracy for binary classifiers and provides a good balance and trade-off between misclassifications (false positives or false negatives). Further details can be found on the Wikipedia page: https://en.wikipedia.org/wiki/F-score.

Exploring the dataset

As specified in our machine learning problem framing, we will use as input data the market observations for the period January-December 2020, as provided by the Yahoo data API.

The following code excerpt, which uses the pandas_datareader module available in our workbench, allows us to easily retrieve the data that we want. The complete working notebook is available at https://github.com...

Adding experiments

So, in this section, we will use the experiments module in MLflow to track the different runs of different models and post them in our workbench database so that the performance results can be compared side by side.

The experiments can actually be done by different model developers as long as they are all pointing to a shared MLflow infrastructure.

To create our first, we will pick a set of model families and evaluate our problem on each of the cases. In broader terms, the major families for classification can be tree-based models, linear models, and neural networks. By looking at the metric that performs better on each of the cases, we can then direct tuning to the best model and use it as our initial model in production.

Our choice for this section includes the following:

  • Logistic Classifier: Part of the family of linear-based models and a commonly used baseline.
  • Xgboost: This belongs to the family of tree boosting algorithms where many weak...

Comparing different models

We have run the experiments in this section for each of the models covered and verified all the different artifacts. Just by looking at our baseline experiment table, and by selecting the common custom metric, f1_experiment_score, we can see that the best performing model is the logistic regression-based model, with an F-score of 0.66:

Figure 4.10 – Comparing different model performance in terms of the goal metric

Metrics can also be compared side by side, as shown in the excerpt in Figure 4.11. On the left side, we have the SKlearn model, and on the right the XGBoost model, with the custom metrics of f1_experiment_score. We can see that the metrics provided by both are different and, hence, the reason for custom metrics when we have different models:

Figure 4.11 – Metrics of the Sklearn model

After comparing the metrics, it becomes clear that the best model is logistic regression. To improve...

Tuning your model with hyperparameter optimization

Machine learning models have many parameters that allow the developer to improve performance and control the model that they are using, providing leverage to better fit the data and production use cases. Hyperparameter optimization is the systematic and automated process of identifying the optimal parameters for your machine learning model and is critical for the successful deployment of such a system.

In the previous section, we identified the best family (in other words, LogisticRegression) model for our problem, so now it's time to identify the right parameters for our model with MLflow. You can follow along in the following notebook in the project repository, Chapter04/gradflow/notebooks/hyperopt_optimization_logistic_regression_mlflow.ipynb:

  1. Importing dependencies: We will use the hyperopt library, which contains multiple algorithms to help us carry out model tuning:
    from hyperopt import tpe
    from hyperopt import...

Summary

In this chapter, we introduced the experiments component of MLflow. We got to understand the logging metrics and artifacts in MLflow. We detailed the steps to track experiments in MLflow.

In the final sections, we explored the use case of hyperparameter optimization using the concepts learned in the chapter.

In the next chapter, we will focus on managing models with MLflow using the models developed in this chapter.

Further reading

To consolidate your knowledge further, you can consult the documentation available at the following links:

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Machine Learning Engineering with MLflow
Published in: Aug 2021Publisher: PacktISBN-13: 9781800560796
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Natu Lauchande

Natu Lauchande is a principal data engineer in the fintech space currently tackling problems at the intersection of machine learning, data engineering, and distributed systems. He has worked in diverse industries, including biomedical/pharma research, cloud, fintech, and e-commerce/mobile. Along the way, he had the opportunity to be granted a patent (as co-inventor) in distributed systems, publish in a top academic journal, and contribute to open source software. He has also been very active as a speaker at machine learning/tech conferences and meetups.
Read more about Natu Lauchande