You're reading from Machine Learning Engineering on AWS

Product typeBook

Published inOct 2022

PublisherPackt

ISBN-139781803247595

Edition1st Edition

Tools

AWS

Concepts

Machine Learning

Author (1)

Joshua Arvin Lat

SageMaker Deployment Solutions

After training our machine learning (ML) model, we can proceed with deploying it to a web API. This API can then be invoked by other applications (for example, a mobile application) to perform a “prediction” or inference. For example, the ML model we trained in Chapter 1, Introduction to ML Engineering on AWS, can be deployed to a web API and then be used to predict the likelihood of customers canceling their reservations or not, given a set of inputs. Deploying the ML model to a web API allows the ML model to be accessible to different applications and systems.

A few years ago, ML practitioners had to spend time building a custom backend API to host and deploy a model from scratch. If you were given this requirement, you might have used a Python framework such as Flask, Pyramid, or Django to deploy the ML model. Building a custom API to serve as an inference endpoint can take about a week or so since most of the application logic needs...

Technical requirements

Before we start, it is important to have the following ready:

A web browser (preferably Chrome or Firefox)
Access to the AWS account and SageMaker Studio domain used in the first chapter of the book

The Jupyter notebooks, source code, and other files used for each chapter are available in this repository: https://github.com/PacktPublishing/Machine-Learning-Engineering-on-AWS.

Important Note

It is recommended to use an IAM user with limited permissions instead of the root account when running the examples in this book. We will discuss this along with other security best practices in detail in Chapter 9, Security, Governance, and Compliance Strategies. If you are just starting with using AWS, you may proceed with using the root account in the meantime.

Getting started with model deployments in SageMaker

In Chapter 6, SageMaker Training and Debugging Solutions, we trained and deployed an image classification model using the SageMaker Python SDK. We made use of a built-in algorithm while working on the hands-on solutions in that chapter. When using a built-in algorithm, we just need to prepare the training dataset along with specifying a few configuration parameters and we are good to go! Note that if we want to train a custom model using our favorite ML framework (such as TensorFlow and PyTorch), then we can prepare our custom scripts and make them work in SageMaker using script mode. This gives us a bit more flexibility since we can tweak how SageMaker interfaces with our model through a custom script that allows us to use different libraries and frameworks when training our model. If we want the highest level of flexibility for the environment where the training scripts will run, then we can opt to use our own custom container image...

Preparing the pre-trained model artifacts

In Chapter 6, SageMaker Training and Debugging Solutions, we created a new folder named CH06, along with a new Notebook using the Data Science image inside the created folder. In this section, we will create a new folder (named CH07), along with a new Notebook inside the created folder. Instead of the Data Science image, we will use the PyTorch 1.10 Python 3.8 CPU Optimized image as the image used in the Notebook since we will download the model artifacts of a pre-trained PyTorch model using the Hugging Face transformers library. Once the Notebook is ready, we will use the Hugging Face transformers library to download a pre-trained model that can be used for sentiment analysis. Finally, we will zip the model artifacts into a model.tar.gz file and upload it to an S3 bucket.

Note

Make sure that you have completed the hands-on solutions in the Getting started with SageMaker and SageMaker Studio section of Chapter 1, Introduction to ML Engineering...

Preparing the SageMaker script mode prerequisites

In this chapter, we will be preparing a custom script to use a pre-trained model for predictions. Before we can proceed with using the SageMaker Python SDK to deploy our pre-trained model to an inference endpoint, we’ll need to ensure that all the script mode prerequisites are ready.

Figure 7.4 – The desired file and folder structure

In Figure 7.4, we can see that there are three prerequisites we’ll need to prepare:

inference.py
requirements.txt
setup.py

We will store these prerequisites inside the scripts directory. We’ll discuss these prerequisites in detail in the succeeding pages of this chapter. Without further ado, let’s start by preparing the inference.py script file!

Preparing the inference.py file

In this section, we will prepare a custom Python script that will be used by SageMaker when processing inference requests. Here, we can influence...

Deploying a pre-trained model to a real-time inference endpoint

In this section, we will use the SageMaker Python SDK to deploy a pre-trained model to a real-time inference endpoint. From the name itself, we can tell that a real-time inference endpoint can process input payloads and perform predictions in real time. If you have built an API endpoint before (which can process GET and POST requests, for example), then we can think of an inference endpoint as an API endpoint that accepts an input request and returns a prediction as part of a response. How are predictions made? The inference endpoint simply loads the model into memory and uses it to process the input payload. This will yield an output that is returned as a response. For example, if we have a pre-trained sentiment analysis ML model deployed in a real-time inference endpoint, then it would return a response of either "POSITIVE" or "NEGATIVE" depending on the input string payload provided in the request...

Deploying a pre-trained model to a serverless inference endpoint

In the initial chapters of this book, we’ve worked with several serverless services that allow us to manage and reduce costs. If you are wondering whether there’s a serverless option when deploying ML models in SageMaker, then the answer to that would be a sweet yes. When you are dealing with intermittent and unpredictable traffic, using serverless inference endpoints to host your ML model can be a more cost-effective option. Let’s say that we can tolerate cold starts (where a request takes longer to process after periods of inactivity) and we only expect a few requests per day – then, we can make use of a serverless inference endpoint instead of the real-time option. Real-time inference endpoints are best used when we can maximize the inference endpoint. If you’re expecting your endpoint to be utilized most of the time, then the real-time option may do the trick.

...

Deploying a pre-trained model to an asynchronous inference endpoint

In addition to real-time and serverless inference endpoints, SageMaker also offers a third option when deploying models – asynchronous inference endpoints. Why is it called asynchronous? For one thing, instead of expecting the results to be available immediately, requests are queued, and results are made available asynchronously. This works for ML requirements that involve one or more of the following:

Large input payloads (up to 1 GB)
A long prediction processing duration (up to 15 minutes)

A good use case for asynchronous inference endpoints would be for ML models that are used to detect objects in large video files (which may take more than 60 seconds to complete). In this case, an inference may take a few minutes instead of a few seconds.

How do we use asynchronous inference endpoints? To invoke an asynchronous inference endpoint, we do the following:

The request payload is...

Cleaning up

Now that we have completed working on the hands-on solutions of this chapter, it is time for us to clean up and turn off any resources we will no longer use. In the next set of steps, we will locate and turn off any remaining running instances in SageMaker Studio:

Click the Running Instances and Kernels icon in the sidebar, as highlighted in Figure 7.21:

Figure 7.21 – Turning off the running instance

Clicking the Running Instances and Kernels icon should open and show the running instances, apps, and terminals in SageMaker Studio.

Turn off all running instances under RUNNING INSTANCES by clicking the Shut down button for each of the instances, as highlighted in Figure 7.21. Clicking the Shut down button will open a pop-up window verifying the instance shutdown operation. Click the Shut down all button to proceed.
Make sure to check for and delete all the running inference endpoints under SageMaker resources as well...

Deployment strategies and best practices

In this section, we will discuss the relevant deployment strategies and best practices when using the SageMaker hosting services. Let’s start by talking about the different ways we can invoke an existing SageMaker inference endpoint. The solution we’ve been using so far involves the usage of the SageMaker Python SDK to invoke an existing endpoint:

from sagemaker.predictor import Predictor
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer
endpoint_name = "<INSERT NAME OF EXISTING ENDPOINT>"
predictor = Predictor(endpoint_name=endpoint_name)
predictor.serializer = JSONSerializer() 
predictor.deserializer = JSONDeserializer()
payload = {
^  "text": "I love reading the book MLE on AWS!"
}
predictor.predict(payload)

Here, we initialize a Predictor object and point it to an existing inference endpoint during the initialization step...

Summary

In this chapter, we discussed and focused on several deployment options and solutions using SageMaker. We deployed a pre-trained model into three different types of inference endpoints – (1) a real-time inference endpoint, (2) a serverless inference endpoint, and (3) an asynchronous inference endpoint. We also discussed the differences of each approach, along with when each option is best used when deploying ML models. Toward the end of this chapter, we talked about some of the deployment strategies, along with the best practices when using SageMaker for model deployments.

In the next chapter, we will dive deeper into SageMaker Model Registry and SageMaker Model Monitor, which are capabilities of SageMaker that can help us manage and monitor our models in production.

The Hugging Face DistilBERT model (https://huggingface.co/docs/transformers/model_doc/distilbert)
SageMaker – Deploying Models for Inference (https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html)
SageMaker – Inference Recommender (https://docs.aws.amazon.com/sagemaker/latest/dg/inference-recommender.html)
SageMaker – Deployment guardrails (https://docs.aws.amazon.com/sagemaker/latest/dg/deployment-guardrails.html)

The rest of the chapter is locked

You have been reading a chapter from

Machine Learning Engineering on AWS

Published in: Oct 2022Publisher: PacktISBN-13: 9781803247595

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Joshua Arvin Lat

Joshua Arvin Lat is the Chief Technology Officer (CTO) of NuWorks Interactive Labs, Inc. He previously served as the CTO for three Australian-owned companies and as director of software development and engineering for multiple e-commerce start-ups in the past. Years ago, he and his team won first place in a global cybersecurity competition with their published research paper. He is also an AWS Machine Learning Hero and has shared his knowledge at several international conferences, discussing practical strategies on machine learning, engineering, security, and management.
Read more about Joshua Arvin Lat

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Machine Learning Engineering on AWS

SageMaker Deployment Solutions

Technical requirements

Getting started with model deployments in SageMaker

Preparing the pre-trained model artifacts

Preparing the SageMaker script mode prerequisites

Preparing the inference.py file

Deploying a pre-trained model to a real-time inference endpoint

Deploying a pre-trained model to a serverless inference endpoint

Deploying a pre-trained model to an asynchronous inference endpoint

Cleaning up

Deployment strategies and best practices

Summary

Further reading

Unlock this book and the full library FREE for 7 days

Author (1)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook