You're reading from Machine Learning Engineering on AWS

Product typeBook

Published inOct 2022

PublisherPackt

ISBN-139781803247595

Edition1st Edition

Tools

AWS

Concepts

Machine Learning

Author (1)

Joshua Arvin Lat

SageMaker Training and Debugging Solutions

In Chapter 2, Deep Learning AMIs, and Chapter 3, Deep Learning Containers, we performed our initial ML training experiments inside EC2 instances. We took note of the cost per hour of running these EC2 instances as there are some cases where we would need to use the more expensive instance types (such as the p2.8xlarge instance at approximately $7.20 per hour) to run our ML training jobs and workloads. To manage and reduce the overall cost of running ML workloads using these EC2 instances, we discussed a few cost optimization strategies, including manually turning off these instances after the training job has finished.

At this point, you might be wondering if it is possible to automate the following processes:

Launching the EC2 instances that will run the ML training jobs
Uploading the model artifacts of the trained ML model to a storage location (such as an S3 bucket) after model training
Deleting the EC2 instances once...

Technical requirements

Before we start, we must have the following ready:

A web browser (preferably Chrome or Firefox)
Access to the AWS account that was used in the first few chapters of this book

The Jupyter notebooks, source code, and other files used for each chapter are available in this book’s GitHub repository: https://github.com/PacktPublishing/Machine-Learning-Engineering-on-AWS.

Important Note

It is recommended to use an IAM user with limited permissions instead of the root account when running the examples in this book. We will discuss this, along with other security best practices, in detail in Chapter 9, Security, Governance, and Compliance Strategies. If you are just starting to use AWS, you may proceed with using the root account in the meantime.

Getting started with the SageMaker Python SDK

The SageMaker Python SDK is a library that allows ML practitioners to train and deploy ML models using the different features and capabilities of SageMaker. It provides several high-level abstractions such as Estimators, Models, Predictors, Sessions, Transformers, and Processors, all of which encapsulate and map to specific ML processes and entities. These abstractions allow data scientists and ML engineers to manage ML experiments and deployments with just a few lines of code. At the same time, infrastructure management is handled by SageMaker already, so all we need to do is configure these high-level abstractions with the correct set of parameters.

Note that it is also possible to use the different capabilities and features of SageMaker using the boto3 library. Compared to using the SageMaker Python SDK, we would be working with significantly more lines of code with boto3 since we would have to take care of the little details when...

Preparing the essential prerequisites

In this section, we will ensure that the following prerequisites are ready before proceeding with the hands-on solutions of this chapter:

We have a service limit increase to run SageMaker training jobs using the ml.p2.xlarge instance (SageMaker Training)
We have a service limit increase to run SageMaker training jobs using the ml.p2.xlarge instance (SageMaker Managed Spot Training)

If you are wondering why we are using ml.p2.xlarge instances in this chapter, that’s because we are required to use one of the supported instance types for the Image Classification Algorithm, as shown in the following screenshot:

Figure 6.2 – EC2 Instance Recommendation for the image classification algorithm

As we can see, we can use ml.p2.xlarge, ml.p2.8xlarge, ml.p2.16xlarge, ml.p3.2xlarge, ml.p3.8xlarge, and ml.p3.16xlarge (at the time of writing) when running training jobs using the Image Classification Algorithm...

Training an image classification model with the SageMaker Python SDK

As mentioned in the Getting started with the SageMaker Python SDK section, we can use built-in algorithms or custom algorithms (using scripts and custom Docker container images) when performing training experiments in SageMaker.

Data scientists and ML practitioners can get started with training and deploying models in SageMaker quickly using one or more of the built-in algorithms prepared by the AWS team. There are a variety of built-in algorithms to choose from and each of these algorithms has been provided to help ML practitioners solve specific business and ML problems. Here are some of the built-in algorithms available, along with some of the use cases and problems these can solve:

DeepAR Forecasting: Time-series forecasting
Principal Component Analysis: Dimensionality reduction
IP Insights: IP anomaly detection
Latent Dirichlet Allocation (LDA): Topic modeling
Sequence-to-Sequence...

Using the Debugger Insights Dashboard

When working on ML requirements, ML practitioners may encounter a variety of issues before coming up with a high-performing ML model. Like software development and programming, building ML models requires a bit of trial and error. Developers generally make use of a variety of debugging tools to help them troubleshoot issues and implementation errors when writing software applications. Similarly, ML practitioners need a way to monitor and debug training jobs when building ML models. Luckily for us, Amazon SageMaker has a capability called SageMaker Debugger that allows us to troubleshoot different issues and bottlenecks when training ML models:

Figure 6.24 – SageMaker Debugger features

The preceding diagram shows the features that are available when we use SageMaker Debugger to monitor, debug, and troubleshoot a variety of issues that affect an ML model’s performance. This includes the data capture capability...

Utilizing Managed Spot Training and Checkpoints

Now that we have a better understanding of how to use the SageMaker Python SDK to train and deploy ML models, let’s proceed with using a few additional options that allow us to reduce costs significantly when running training jobs. In this section, we will utilize the following SageMaker features and capabilities when training a second Image Classification model:

Managed Spot Training
Checkpointing
Incremental Training

In Chapter 2, Deep Learning AMIs, we mentioned that spot instances can be used to reduce the cost of running training jobs. Using spot instances instead of on-demand instances can help reduce the overall cost by up to 70% to 90%. So, why are spot instances cheaper? The downside of using spot instances is that these instances can be interrupted, which will restart the training job from the start. If we were to train our models outside of SageMaker, we would have to prepare our own set of custom...

Cleaning up

Follow these steps to locate and turn off any remaining running instances in SageMaker Studio:

Click the Running Instances and Kernels icon in the sidebar of Amazon SageMaker Studio, as highlighted in the following screenshot:

Figure 6.34 – Turning off any remaining running instances

Clicking the Running Instances and Kernels icon should open and show the running instances, apps, and terminals in SageMaker Studio.

Turn off any remaining running instances under RUNNING INSTANCES by clicking the Shutdown button for each of the instances as highlighted in the preceding screenshot. Clicking the Shutdown button will open a pop-up window verifying the instance shutdown operation. Click the Shut down all button to proceed.

Note that this cleanup operation needs to be performed after using SageMaker Studio. These resources are not turned off automatically by SageMaker, even during periods of inactivity. Turning off unused...

Summary

In this chapter, we trained and deployed ML models using the SageMaker Python SDK. We started by using the MNIST dataset (training dataset) and SageMaker’s built-in Image Classification Algorithm to train an image classifier model. After that, we took a closer look at the resources used during the training step by using the Debugger Insights Dashboard available in SageMaker Studio. Finally, we performed a second training experiment that made use of several features and options available in SageMaker, such as managed spot training, checkpointing, and incremental training.

In the next chapter, we will dive deeper into the different deployment options and strategies when performing model deployments using SageMaker. We will be deploying a pre-trained model into a variety of inference endpoint types, including the real-time, serverless, and asynchronous inference endpoints.

Amazon SageMaker Debugger (https://docs.aws.amazon.com/sagemaker/latest/dg/train-debugger.html)
Use Checkpoints in Amazon SageMaker (https://docs.aws.amazon.com/sagemaker/latest/dg/model-checkpoints.html)
Incremental Training in Amazon SageMaker (https://docs.aws.amazon.com/sagemaker/latest/dg/incremental-training.html)
Managed Spot Training in Amazon SageMaker (https://docs.aws.amazon.com/sagemaker/latest/dg/model-managed-spot-training.html)

The rest of the chapter is locked

You have been reading a chapter from

Machine Learning Engineering on AWS

Published in: Oct 2022Publisher: PacktISBN-13: 9781803247595

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Joshua Arvin Lat

Joshua Arvin Lat is the Chief Technology Officer (CTO) of NuWorks Interactive Labs, Inc. He previously served as the CTO for three Australian-owned companies and as director of software development and engineering for multiple e-commerce start-ups in the past. Years ago, he and his team won first place in a global cybersecurity competition with their published research paper. He is also an AWS Machine Learning Hero and has shared his knowledge at several international conferences, discussing practical strategies on machine learning, engineering, security, and management.
Read more about Joshua Arvin Lat

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Machine Learning Engineering on AWS

SageMaker Training and Debugging Solutions

Technical requirements

Getting started with the SageMaker Python SDK

Preparing the essential prerequisites

Training an image classification model with the SageMaker Python SDK

Using the Debugger Insights Dashboard

Utilizing Managed Spot Training and Checkpoints

Cleaning up

Summary

Further reading

Unlock this book and the full library FREE for 7 days

Author (1)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook