Reader small image

You're reading from  Machine Learning Engineering on AWS

Product typeBook
Published inOct 2022
PublisherPackt
ISBN-139781803247595
Edition1st Edition
Tools
Right arrow
Author (1)
Joshua Arvin Lat
Joshua Arvin Lat
author image
Joshua Arvin Lat

Joshua Arvin Lat is the Chief Technology Officer (CTO) of NuWorks Interactive Labs, Inc. He previously served as the CTO for three Australian-owned companies and as director of software development and engineering for multiple e-commerce start-ups in the past. Years ago, he and his team won first place in a global cybersecurity competition with their published research paper. He is also an AWS Machine Learning Hero and has shared his knowledge at several international conferences, discussing practical strategies on machine learning, engineering, security, and management.
Read more about Joshua Arvin Lat

Right arrow

Utilizing Managed Spot Training and Checkpoints

Now that we have a better understanding of how to use the SageMaker Python SDK to train and deploy ML models, let’s proceed with using a few additional options that allow us to reduce costs significantly when running training jobs. In this section, we will utilize the following SageMaker features and capabilities when training a second Image Classification model:

  • Managed Spot Training
  • Checkpointing
  • Incremental Training

In Chapter 2, Deep Learning AMIs, we mentioned that spot instances can be used to reduce the cost of running training jobs. Using spot instances instead of on-demand instances can help reduce the overall cost by up to 70% to 90%. So, why are spot instances cheaper? The downside of using spot instances is that these instances can be interrupted, which will restart the training job from the start. If we were to train our models outside of SageMaker, we would have to prepare our own set of custom...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Machine Learning Engineering on AWS
Published in: Oct 2022Publisher: PacktISBN-13: 9781803247595

Author (1)

author image
Joshua Arvin Lat

Joshua Arvin Lat is the Chief Technology Officer (CTO) of NuWorks Interactive Labs, Inc. He previously served as the CTO for three Australian-owned companies and as director of software development and engineering for multiple e-commerce start-ups in the past. Years ago, he and his team won first place in a global cybersecurity competition with their published research paper. He is also an AWS Machine Learning Hero and has shared his knowledge at several international conferences, discussing practical strategies on machine learning, engineering, security, and management.
Read more about Joshua Arvin Lat