You're reading from Machine Learning Engineering on AWS

Product typeBook

Published inOct 2022

PublisherPackt

ISBN-139781803247595

Edition1st Edition

Tools

AWS

Concepts

Machine Learning

Author (1)

Joshua Arvin Lat

Machine Learning Pipelines with SageMaker Pipelines

In Chapter 10, Machine Learning Pipelines with Kubeflow on Amazon EKS, we used Kubeflow, Kubernetes, and Amazon EKS to build and run an end-to-end machine learning (ML) pipeline. Here, we were able to automate several steps in the ML process inside a running Kubernetes cluster. If you are wondering whether we can also build ML pipelines using the different features and capabilities of SageMaker, then the quick answer to that would be YES!

In this chapter, we will use SageMaker Pipelines to build and run automated ML workflows. In addition to this, we will demonstrate how we can utilize AWS Lambda functions to deploy trained models to new (or existing) ML inference endpoints during pipeline execution.

That said, in this chapter, we will cover the following topics:

Diving deeper into SageMaker Pipelines
Preparing the essential prerequisites
Running our first pipeline with SageMaker Pipelines
Creating Lambda...

Technical requirements

Before we start, it is important that we have the following ready:

A web browser (preferably Chrome or Firefox)
Access to the AWS account and the SageMaker Studio domain used in the previous chapters of this book
A text editor (for example, VS Code) on your local machine that we will use for storing and copying string values for later use in this chapter

The Jupyter notebooks, source code, and other files used for each chapter are available in the repository at https://github.com/PacktPublishing/Machine-Learning-Engineering-on-AWS.

Important Note

It is recommended that you use an IAM user with limited permissions instead of the root account when running the examples in this book. If you are just starting out with using AWS, you can proceed with using the root account in the meantime.

Diving deeper into SageMaker Pipelines

Often, data science teams start by performing ML experiments and deployments manually. Once they need to standardize the workflow and enable automated model retraining to refresh the deployed models regularly, these teams would then start considering the use of ML pipelines to automate a portion of their work. In Chapter 6, SageMaker Training and Debugging Solutions, we learned how to use the SageMaker Python SDK to train an ML model. Generally, training an ML model with the SageMaker Python SDK involves running a few lines of code similar to what we have in the following block of code:

estimator = Estimator(...) 
estimator.set_hyperparameters(...)
estimator.fit(...)

What if we wanted to prepare an automated ML pipeline and include this as one of the steps? You would be surprised that all we need to do is add a few lines of code to convert this into a step that can be included in a pipeline! To convert this into a step using SageMaker Pipelines...

Preparing the essential prerequisites

In this section, we will ensure that the following prerequisites are ready:

The SageMaker Studio Domain execution role with the AWSLambda_FullAccess AWS managed permission policy attached to it – This will allow the Lambda functions to run without issues in the Completing the end-to-end ML pipeline section of this chapter.
The IAM role (pipeline-lambda-role) – This will be used to run the Lambda functions in the Creating Lambda Functions for Deployment section of this chapter.
The processing.py file – This will be used by the SageMaker Processing job to process the input data and split it into training, validation, and test sets.
The bookings.all.csv file – This will be used as the input dataset for the ML pipeline.

Important Note

In this chapter, we will create and manage our resources in the Oregon (us-west-2) region. Make sure that you have set the correct region before proceeding with...

Running our first pipeline with SageMaker Pipelines

In Chapter 1, Introduction to ML Engineering on AWS, we installed and used AutoGluon to train multiple ML models (with AutoML) inside an AWS Cloud9 environment. In addition to this, we performed the different steps of the ML process manually using a variety of tools and libraries. In this chapter, we will convert these manually executed steps into an automated pipeline so that all we need to do is provide an input dataset and the ML pipeline will do the rest of the work for us (and store the trained model in a model registry).

Note

Instead of preparing a custom Docker container image to use AutoGluon for training ML models, we will use the built-in AutoGluon-Tabular algorithm instead. With a built-in algorithm available for use, all we need to worry about would be the hyperparameter values and the additional configuration parameters we will use to configure the training job.

That said, this section is divided into two parts...

Creating Lambda functions for deployment

Our second (and more complete pipeline) will require a few additional resources to help us deploy our ML model. In this section, we will create the following Lambda functions:

check-if-endpoint-exists – This is a Lambda function that accepts the name of the ML inference endpoint as input and returns True if the endpoint exists already.
deploy-model-to-new-endpoint – This is a Lambda function that accepts the model package ARN as input (along with the role and the endpoint name) and deploys the model into a new inference endpoint
deploy-model-to-existing-endpoint – This is a Lambda function that accepts the model package ARN as input (along with the role and the endpoint name) and deploys the model into an existing inference endpoint (by updating the deployed model inside the ML instance)

We will use these functions later in the Completing the end-to-end ML pipeline section to deploy the ML model we...

Testing our ML inference endpoint

Of course, we need to check whether the ML inference endpoint is working! In the next set of steps, we will download and run a Jupyter notebook (named Test Endpoint and then Delete.ipynb) that tests our ML inference endpoint using the test dataset:

Let’s begin by opening the following link in another browser tab: https://bit.ly/3xyVAXz
Right-click on any part of the page to open a context menu, and then choose Save as... from the list of available options. Save the file as Test Endpoint then Delete.ipynb, and then download it to the Downloads folder (or similar) on your local machine.
Navigate back to your SageMaker Studio environment. In the File Tree (located on the left-hand side of the SageMaker Studio environment), make sure that you are in the CH11 folder similar to what we have in Figure 11.15:

Figure 11.15 – Uploading the test endpoint and then the Delete.ipynb file

Click on the...

Completing the end-to-end ML pipeline

In this section, we will build on top of the (partial) pipeline we prepared in the Running our first pipeline with SageMaker Pipelines section of this chapter. In addition to the steps and resources used to build our partial pipeline, we will also utilize the Lambda functions we created (in the Creating Lambda functions for deployment section) to complete our ML pipeline.

Defining and preparing the complete ML pipeline

The second pipeline we will prepare would be slightly longer than the first pipeline. To help us visualize how our second ML pipeline using SageMaker Pipelines will look like, let’s quickly check Figure 11.16:

Figure 11.16 – Our second ML pipeline using SageMaker Pipelines

Here, we can see that our pipeline accepts two input parameters—the input dataset and the endpoint name. When the pipeline runs, the input dataset is first split into training, validation, and test sets. The...

Cleaning up

Now that we have completed working on the hands-on solutions of this chapter, it is time we clean up and turn off the resources we will no longer use. In the next set of steps, we will locate and turn off any remaining running instances in SageMaker Studio:

Make sure to check and delete all running inference endpoints under SageMaker resources (if any). To check whether there are running inference endpoints, click on the SageMaker resources icon and then select Endpoints from the list of options in the drop-down menu.
Open the File menu and select Shut down from the list of available options. This should turn off all running instances inside SageMaker Studio.

It is important to note that this cleanup operation needs to be performed after using SageMaker Studio. These resources are not turned off automatically by SageMaker even during periods of inactivity. Make sure to review whether all delete operations have succeeded before proceeding to the next section...

Recommended strategies and best practices

Before we end this chapter (and this book), let’s quickly discuss some of the recommended strategies and best practices when using SageMaker Pipelines to prepare automated ML workflows. What improvements can we make to the initial version of our pipeline? Here are some of the possible upgrades we can implement to make our setup more scalable, more secure, and more capable of handling different types of ML and ML engineering requirements:

Configure and set up autoscaling (automatic scaling) of the ML inference endpoint upon creation to dynamically adjust the number of resources used to handle the incoming traffic (of ML inference requests).
Allow ML models to also be deployed in serverless and asynchronous endpoints (depending on the value of an additional pipeline input parameter) to help provide additional model deployment options for a variety of use cases.
Add an additional step (or steps) in the pipeline that automatically...

Summary

In this chapter, we used SageMaker Pipelines to build end-to-end automated ML pipelines. We started by preparing a relatively simple pipeline with three steps—including the data preparation step, the model training step, and the model registration step. After preparing and defining the pipeline, we proceeded with triggering a pipeline execution that registered a newly trained model to the SageMaker Model Registry after the pipeline execution finished running.

Then, we prepared three AWS Lambda functions that would be used for the model deployment steps of the second ML pipeline. After preparing the Lambda functions, we proceeded with completing the end-to-end ML pipeline by adding a few additional steps to deploy the model to a new or existing ML inference endpoint. Finally, we discussed relevant best practices and strategies to secure, scale, and manage ML pipelines using the technology stack we used in this chapter.

You’ve finally reached the end of this...

Amazon SageMaker Model Building Pipelines – Pipeline Steps (https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html)
Boto3 – SageMaker Client (https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html)
Amazon SageMaker – AutoGluon-Tabular Algorithm (https://docs.aws.amazon.com/sagemaker/latest/dg/autogluon-tabular.html)
Automate MLOps with SageMaker Projects (https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects.html)
Machine Learning Savings Plans (https://aws.amazon.com/savingsplans/ml-pricing/)
SageMaker – Amazon EventBridge Integration (https://docs.aws.amazon.com/sagemaker/latest/dg/pipeline-eventbridge...

The rest of the chapter is locked

You have been reading a chapter from

Machine Learning Engineering on AWS

Published in: Oct 2022Publisher: PacktISBN-13: 9781803247595

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Joshua Arvin Lat

Joshua Arvin Lat is the Chief Technology Officer (CTO) of NuWorks Interactive Labs, Inc. He previously served as the CTO for three Australian-owned companies and as director of software development and engineering for multiple e-commerce start-ups in the past. Years ago, he and his team won first place in a global cybersecurity competition with their published research paper. He is also an AWS Machine Learning Hero and has shared his knowledge at several international conferences, discussing practical strategies on machine learning, engineering, security, and management.
Read more about Joshua Arvin Lat

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Machine Learning Engineering on AWS

Machine Learning Pipelines with SageMaker Pipelines

Technical requirements

Diving deeper into SageMaker Pipelines

Preparing the essential prerequisites

Running our first pipeline with SageMaker Pipelines

Creating Lambda functions for deployment

Testing our ML inference endpoint

Completing the end-to-end ML pipeline

Defining and preparing the complete ML pipeline

Cleaning up

Recommended strategies and best practices

Summary

Further reading

Unlock this book and the full library FREE for 7 days

Author (1)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook