Reader small image

You're reading from  Practical Deep Learning at Scale with MLflow

Product typeBook
Published inJul 2022
PublisherPackt
ISBN-139781803241333
Edition1st Edition
Right arrow
Author (1)
Yong Liu
Yong Liu
author image
Yong Liu

Yong Liu has been working in big data science, machine learning, and optimization since his doctoral student years at the University of Illinois at Urbana-Champaign (UIUC) and later as a senior research scientist and principal investigator at the National Center for Supercomputing Applications (NCSA), where he led data science R&D projects funded by the National Science Foundation and Microsoft Research. He then joined Microsoft and AI/ML start-ups in the industry. He has shipped ML and DL models to production and has been a speaker at the Spark/Data+AI summit and NLP summit. He has recently published peer-reviewed papers on deep learning, linked data, and knowledge-infused learning at various ACM/IEEE conferences and journals.
Read more about Yong Liu

Right arrow

Chapter 8: Deploying a DL Inference Pipeline at Scale

Deploying a deep learning (DL) inference pipeline for production usage is both exciting and challenging. The exciting part is that, finally, the DL model pipeline can be used for prediction with real-world production data, which will provide real value to the business scenarios. However, the challenging part is that there are different DL model serving platforms and host environments. It is not easy to choose the right framework for the right model serving scenarios, which can minimize deployment complexity but provide the best model serving experiences in a scalable and cost-effective way. This chapter will cover the topics as an overview of different deployment scenarios and host environments, and then provide hands-on learning on how to deploy to different environments, including local and remote cloud environments using MLflow deployment tools. By the end of this chapter, you should be able to confidently deploy an MLflow DL...

Technical requirements

The following items are required for this chapter's learning:

Understanding different deployment tools and host environments

There are different deployment tools in the MLOps technology stack that have different target use cases and host environments for deploying different model inference pipelines. In Chapter 7, Multi-Step Deep Learning Inference Pipeline, we learned the different inference scenarios and requirements and implemented a multi-step DL inference pipeline that can be deployed into a model hosting/serving environment. Now, we will learn how to deploy such a model to a few specific model hosting and serving environments. This is visualized in Figure 8.1 as follows:

Figure 8.1 – Using model deployment tools to deploy a model inference pipeline to a model hosting and serving environment

As can be seen from Figure 8.1, there can be different deployment tools for different model hosting and serving environments. Here, we list the three typical scenarios as follows:

  • Batch inference at scale: If we...

Deploying locally for batch and web service inference

For development and testing purposes, we usually need to deploy our model locally to verify it works as expected. Let's see how to do it for two scenarios: batch inference and web service inference.

Batch inference

For batch inference, follow these instructions:

  1. Make sure you have completed Chapter 7, Multi-Step Deep Learning Inference Pipeline. This will produce an MLflow pyfunc DL inference model pipeline URI that can be loaded using standard MLflow Python functions. The logged model can be uniquely located by the run_id and model name as follows:
    logged_model = 'runs:/37b5b4dd7bc04213a35db646520ec404/inference_pipeline_model'

The model can also be identified by the model name and version number using the model registry as follows:

logged_model = 'models:/inference_pipeline_model/6'
  1. Follow the instructions under the Batch inference at-scale using PySpark UDF function section...

Deploying using Ray Serve and MLflow deployment plugins

A more generic way to do deployment is to use a framework such as Ray Serve (https://docs.ray.io/en/latest/serve/index.html). Ray Serve has several advantages, such as DL model frameworks agnostics, native Python support, and supporting complex model composition inference patterns. Ray Serve supports all major DL frameworks and any arbitrary business logic. So, can we leverage both Ray Serve and MLflow to do model deployment and serve? The good news is that we can use the MLflow deployment plugins provided by Ray Serve to do this. Let's walk through how to use the mlflow-ray-serve plugin to do MLflow model deployment using Ray Serve (https://github.com/ray-project/mlflow-ray-serve). Before we begin, we need to install the mlflow-ray-serve package:

pip install mlflow-ray-serve

Then, we need to start a single node Ray cluster locally first using the following two commands:

ray start --head
serve start

This will...

Deploying to AWS SageMaker – a complete end-to-end guide

AWS SageMaker has a cloud-hosted model service managed by AWS. We will use AWS SageMaker as an example to show you how to deploy to a remote cloud provider for hosted web services that can serve real production traffic. AWS SageMaker has a suite of ML/DL-related services including supporting annotation and model training and many more. Here, we show how to bring your own model (BYOM) for deployment. This means that you have a model inference pipeline trained outside of AWS SageMaker, and now just need to deploy to SageMaker for hosting. Follow the next steps to prepare and deploy a DL sentiment model. A few prerequisites are required:

  • You must have Docker Desktop running in your local environment.
  • You must have an AWS account. You can create a free AWS account easily through the free signup website at https://aws.amazon.com/free/.

Once you have these requirements , activate the dl-model-chapter08 conda...

Summary

In this chapter, we have learned different ways to deploy an MLflow inference pipeline model for both batch inference and online real-time inference. We started with a brief survey on different model serving scenarios (batch, streaming, and on-device) and looked at three different categories of tools for MLflow model deployment (the MLflow built-in deployment tool, MLflow deployment plugins, and generic model inference serving frameworks that could work with the MLflow inference model). Then, we covered several local deployment scenarios, using the PySpark UDF function to do batch inference and MLflow local deployment for web service. Afterward, we learned how to use Ray Serve in conjunction with the mlflow-ray-serve plugin to deploy an MLflow Python inference pipeline model into a local Ray cluster. This opens doors to deploy to any cloud platform such as AWS, Azure ML, or GCP, as long as we can set up a Ray cluster in the cloud. Finally, we provided a complete end-to-end...

Further reading

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Practical Deep Learning at Scale with MLflow
Published in: Jul 2022Publisher: PacktISBN-13: 9781803241333
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Yong Liu

Yong Liu has been working in big data science, machine learning, and optimization since his doctoral student years at the University of Illinois at Urbana-Champaign (UIUC) and later as a senior research scientist and principal investigator at the National Center for Supercomputing Applications (NCSA), where he led data science R&D projects funded by the National Science Foundation and Microsoft Research. He then joined Microsoft and AI/ML start-ups in the industry. He has shipped ML and DL models to production and has been a speaker at the Spark/Data+AI summit and NLP summit. He has recently published peer-reviewed papers on deep learning, linked data, and knowledge-infused learning at various ACM/IEEE conferences and journals.
Read more about Yong Liu