Reader small image

You're reading from  Machine Learning Engineering with MLflow

Product typeBook
Published inAug 2021
PublisherPackt
ISBN-139781800560796
Edition1st Edition
Tools
Right arrow
Author (1)
Natu Lauchande
Natu Lauchande
author image
Natu Lauchande

Natu Lauchande is a principal data engineer in the fintech space currently tackling problems at the intersection of machine learning, data engineering, and distributed systems. He has worked in diverse industries, including biomedical/pharma research, cloud, fintech, and e-commerce/mobile. Along the way, he had the opportunity to be granted a patent (as co-inventor) in distributed systems, publish in a top academic journal, and contribute to open source software. He has also been very active as a speaker at machine learning/tech conferences and meetups.
Read more about Natu Lauchande

Right arrow

Chapter 10: Scaling Up Your Machine Learning Workflow

In this chapter, you will learn about diverse techniques and patterns to scale your machine learning (ML) workflow in different scalability dimensions. We will look at using a Databricks managed environment to scale your MLflow development capabilities, adding Apache Spark for cases where you have larger datasets. We will explore NVIDIA RAPIDS and graphics processing unit (GPU) support, and the Ray distributed frameworks to accelerate your ML workloads. The format of this chapter is a small proof-of-concept with a defined canonical dataset to demonstrate a technique and toolchain.

Specifically, we will look at the following sections in this chapter: 

  • Developing models with a Databricks Community Edition environment
  • Integrating MLflow with Apache Spark
  • Integrating MLflow with NVIDIA RAPIDS (GPU)
  • Integrating MLflow with the Ray platform

This chapter will require researching the appropriate...

Technical requirements

For this chapter, you will need the following prerequisites: 

Developing models with a Databricks Community Edition environment

In many scenarios of small teams and companies, starting up a centralized ML environment might be a costly, resource-intensive, upfront investment. A team being able to quickly scale and getting a team up to speed is critical to unlocking the value of ML in an organization. The use of managed services is very relevant in these cases to start prototyping systems and to begin to understand the viability of using ML at a lower cost.

A very popular managed ML and data platform is the Databricks platform, developed by the same company that developed MLflow. We will use in this section the Databricks Community Edition version and license targeted for students and personal use.

In order to explore the Databricks platform to develop and share models, you need to execute the following steps:

  1. Sign up to Databricks Community Edition at https://community.cloud.databricks.com/ and create an account.
  2. Log in to your...

Integrating MLflow with Apache Spark

Apache Spark is a very scalable and popular big data framework that allows data processing at a large scale. For more details and documentation, please go to https://spark.apache.org/. As a big data tool, it can be used to speed up parts of your ML inference, as it can be set at a training or an inference level.

In this particular case, we will illustrate how to implement it to use the model developed in the previous section on the Databricks environment to scale the batch-inference job to larger amounts of data.

In other to explore Spark integration with MLflow, we will execute the following steps:

  1. Create a new notebook named inference_job_spark in Python, linking to a running cluster where the bitpred_poc.ipynb notebook was just created.
  2. Upload your data to dbfs on the File/Upload data link in the environment.
  3. Execute the following script in a cell of the notebook, changing the logged_model and df filenames for the ones...

Integrating MLflow with NVIDIA RAPIDS (GPU)

Training and tuning ML models is a long and computationally expensive operation and is one of the operations that can benefit the most from parallel processing. We will explore in this section the integration of your MLflow training jobs, including hyperparameter optimization, with the NVIDIA RAPIDS framework.

To integrate the NVIDIA RAPIDS library, follow the next steps:

  1. Install RAPIDS in the most convenient way for your environment, outlined as follows:

    a. https://rapids.ai/start.html contains detailed information on deployment options.

    b. https://developer.nvidia.com/blog/run-rapids-on-google-colab/ details how to run RAPIDS on Google Colaboratory (Google Colab).

  2. Install MLflow in your environment.
  3. Import the needed libraries, as follows:
    import argparse
    from functools import partial
    import mlflow
    import mlflow.sklearn
    from cuml.metrics.accuracy import accuracy_score
    from cuml.preprocessing.model_selection import train_test_split...

Integrating MLflow with the Ray platform

The Ray framework (https://docs.ray.io/en/master/) is a distributed platform that allows you to quickly scale the deployment infrastructure.

With Ray, you can add arbitrary logic when running an ML platform that needs to scale in the same way as model serving. It's basically a web framework.

We preloaded the model and contents that will be used into the following folder of the repository: https://github.com/PacktPublishing/Machine-Learning-Engineering-with-MLflow/tree/master/Chapter10/mlflow-ray-serve-integration.

In order to execute your model serving into Ray, execute the following steps:

  1. Install the Ray package by running the following command:
    pip install -U ray
  2. Install MLflow in your environment.
  3. Import the needed libraries, as follows:
    import ray
    from ray import serve
    import mlflow.pyfunc
  4. Implement the model backend, which basically means wrapping up the model-serving function into your Ray serving environment...

Summary

In this chapter, we focused on scaling your ability to run, develop, and distribute models using a Databricks environment. We also looked at integrating an Apache Spark flow into our batch-inference workflows to handle scenarios where we have access to large datasets.

We concluded the chapter with two approaches to scale hyperparameter optimization and application programming interface (API) serving with scalability, using the NVIDIA RAPIDS framework and the Ray distributed framework.

In the next chapter and in further sections of the book, we will focus on the observability and performance monitoring of ML models.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Machine Learning Engineering with MLflow
Published in: Aug 2021Publisher: PacktISBN-13: 9781800560796
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Natu Lauchande

Natu Lauchande is a principal data engineer in the fintech space currently tackling problems at the intersection of machine learning, data engineering, and distributed systems. He has worked in diverse industries, including biomedical/pharma research, cloud, fintech, and e-commerce/mobile. Along the way, he had the opportunity to be granted a patent (as co-inventor) in distributed systems, publish in a top academic journal, and contribute to open source software. He has also been very active as a speaker at machine learning/tech conferences and meetups.
Read more about Natu Lauchande