Reader small image

You're reading from  Practical Deep Learning at Scale with MLflow

Product typeBook
Published inJul 2022
PublisherPackt
ISBN-139781803241333
Edition1st Edition
Right arrow
Author (1)
Yong Liu
Yong Liu
author image
Yong Liu

Yong Liu has been working in big data science, machine learning, and optimization since his doctoral student years at the University of Illinois at Urbana-Champaign (UIUC) and later as a senior research scientist and principal investigator at the National Center for Supercomputing Applications (NCSA), where he led data science R&D projects funded by the National Science Foundation and Microsoft Research. He then joined Microsoft and AI/ML start-ups in the industry. He has shipped ML and DL models to production and has been a speaker at the Spark/Data+AI summit and NLP summit. He has recently published peer-reviewed papers on deep learning, linked data, and knowledge-infused learning at various ACM/IEEE conferences and journals.
Read more about Yong Liu

Right arrow

Chapter 2: Getting Started with MLflow for Deep Learning

One of the key capabilities of MLflow is to enable Machine Learning (ML) experiment management. This is critical because data science requires reproducibility and traceability so that a Deep Learning (DL) model can be easily reproduced with the same data, code, and execution environment. This chapter will help us get started with how to implement DL experiment management quickly. We will learn about MLflow experiment management concepts and capabilities, set up an MLflow development environment, and complete our first DL experiment using MLflow. By the end of this chapter, we will have a working MLflow tracking server showing our first DL experiment results.

In this chapter, we're going to cover the following main topics:

  • Setting up MLflow
  • Implementing our first MLflow logging-enabled DL experiment
  • Exploring MLflow's components and usage patterns

Technical requirements

To complete the experiment in this chapter, we will need the following tools, libraries, and GitHub repositories installed or checked out on our computer:

Setting up MLflow

MLflow is an open source tool that is primarily written in Python. It has over 10,000 stars tagged in its GitHub source repository (https://github.com/mlflow/mlflow). The benefits of using MLflow are numerous, but we can illustrate one benefit with the following scenario: Let's say you are starting a new ML project, trying to evaluate different algorithms and model parameters. Within a few days, you run hundreds of experiments with lots of code changes using different ML/DL libraries and get different models with different parameters and accuracies. You need to compare which model works better and also allow your team members to reproduce the results for model review purposes. Do you prepare a spreadsheet and write down the model name, parameters, accuracies, and location of the models? How can someone else rerun your code or use your trained model with a different set of evaluation datasets? This can quickly become unmanageable when you have lots of iterations...

Implementing our first DL experiment with MLflow autologging

Let's use the DL sentiment classifier we built in Chapter 1, Deep Learning Life Cycle and MLOps Challenges, and add MLflow autologging to it to explore MLflow's tracking capabilities:

  1. First, we need to import the MLflow module:
    import mlflow

This will provide MLflow Application Programming Interfaces (APIs) for logging and loading models.

  1. Just before we run the training code, we need to set up an active experiment using mlflow.set_experiment for the current running code:
    EXPERIMENT_NAME = "dl_model_chapter02"
    mlflow.set_experiment(EXPERIMENT_NAME)
    experiment = mlflow.get_experiment_by_name(EXPERIMENT_NAME)
    print("experiment_id:", experiment.experiment_id)

This sets an experiment named dl_model_chapter02 to be the current active experiment. If this experiment does not exist in your current tracking server, it will be created automatically.

Environment Variable

...

Exploring MLflow's components and usage patterns

Let's use the working example implemented in the previous section to explore the following central concepts, components, and usages in MLflow. These include experiments, runs, metadata about experiments, artifacts for experiments, models, and code.

Exploring experiments and runs in MLflow

Experiment is a first-class entity in the MLflow APIs. This makes sense as data scientists and ML engineers need to run lots of experiments in order to build a working model that meets the requirements. However, the idea of an experiment goes beyond just the model development stage and extends to the entire life cycle of the ML/DL development and deployment. So, this means that when we do retraining or training for a production version of the model, we need to treat them as production-quality experiments. This unified view of experiments builds a bridge between the offline and online production environments. Each experiment consists...

Summary

In this chapter, we learned how to set up MLflow to work with either a local MLflow tracking server or a remote MLflow tracking server. Then, we implemented our first DL model with MLflow autologging enabled. This allowed us to explore MLflow in a hands-on way to understand a few central concepts and foundational components such as experiments, runs, metadata about experiments and runs, code tracking, model logging, and model flavor. The knowledge and first-round experiences gained in this chapter will help us to learn more in-depth MLflow tracking APIs in the next chapter.

Further reading

To further your knowledge, you can consult the following resources and documentation:

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Practical Deep Learning at Scale with MLflow
Published in: Jul 2022Publisher: PacktISBN-13: 9781803241333
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Yong Liu

Yong Liu has been working in big data science, machine learning, and optimization since his doctoral student years at the University of Illinois at Urbana-Champaign (UIUC) and later as a senior research scientist and principal investigator at the National Center for Supercomputing Applications (NCSA), where he led data science R&D projects funded by the National Science Foundation and Microsoft Research. He then joined Microsoft and AI/ML start-ups in the industry. He has shipped ML and DL models to production and has been a speaker at the Spark/Data+AI summit and NLP summit. He has recently published peer-reviewed papers on deep learning, linked data, and knowledge-infused learning at various ACM/IEEE conferences and journals.
Read more about Yong Liu