Reader small image

You're reading from  AWS Certified Machine Learning Specialty: MLS-C01 Certification Guide

Product typeBook
Published inMar 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781800569003
Edition1st Edition
Languages
Right arrow
Authors (2):
Somanath Nanda
Somanath Nanda
author image
Somanath Nanda

Somanath has 10 years of working experience in IT industry which includes Prod development, Devops, Design and architect products from end to end. He has also worked at AWS as a Big Data Engineer for about 2 years.
Read more about Somanath Nanda

Weslley Moura
Weslley Moura
author image
Weslley Moura

Weslley Moura has been developing data products for the past decade. At his recent roles, he has been influencing data strategy and leading data teams into the urban logistics and blockchain industries.
Read more about Weslley Moura

View More author details
Right arrow

Chapter 9: Amazon SageMaker Modeling

In the previous chapter, we learned several methods of model optimization and evaluation techniques. We also learned various ways of storing data, processing data, and applying different statistical approaches to data. So, how can we now build a pipeline for this? Well, we can read data, process data, and build machine learning models on the processed data. But what if my first machine learning model does not perform well? Can I fine-tune my model? The answer is Yes; you can perform nearly everything using Amazon SageMaker. In this chapter, we will walk you through the following topics using Amazon SageMaker:

  • Understanding different instances of Amazon SageMaker
  • Cleaning and preparing data in Jupyter Notebook in Amazon SageMaker
  • Model training in Amazon SageMaker
  • Using SageMaker's built-in machine learning algorithms
  • Writing custom training and inference code in SageMaker

Technical requirements

Creating notebooks in Amazon SageMaker

If you're working with machine learning, then you need to perform actions such as storing data, processing data, preparing data for model training, model training, and deploying the model for inference. They are not easy, and each of these stages requires a machine to perform the task. With Amazon SageMaker, life becomes much easier when carrying out these steps.

What is Amazon SageMaker?

SageMaker provides training instances to train a model using the data and provides endpoint instances to infer by using the model. It also provides notebook instances, running Jupyter Notebooks, to clean and understand the data. If you're happy with your cleaning process, then you should store them in S3 as part of the staging for training. You can launch training instances to consume this training data and produce a machine learning model. The machine learning model can be stored in S3, and endpoint instances can consume the model to produce...

Model tuning

In Chapter 8, Evaluating and Optimizing Models, you learned many important concepts about model tuning. Let's now explore this topic from a practical perspective.

In order to tune a model on SageMaker, we have to call create_hyper_parameter_tuning_job and pass the following main parameters:

  • HyperParameterTuningJobName: This is the name of the tuning job. It is useful to track the training jobs that have been started on behalf of your tuning job.
  • HyperParameterTuningJobConfig: Here, you can configure your tuning options. For example, which parameters you want to tune, the range of values for them, the type of optimization (such as random search or Bayesian search), the maximum number of training jobs you want to spin up, and more.
  • TrainingJobDefinition: Here, you can configure your training job. For example, the data channels, the output location, the resource configurations, the evaluation metrics, and the stop conditions.

In SageMaker, the...

Choosing instance types in Amazon SageMaker

SageMaker is a pay-for-usage model. There is no minimum fee for it.

When we think about instances on SageMaker, it all starts with an EC2 instance. This instance is responsible for all your processing. It's a managed EC2 instance. These instances won't show up in the EC2 console and cannot be SSHed either. The instance type starts with ml.

SageMaker offers instances of the following families:

  • The t family: This is a burstable CPU family. With this family, you get a normal ratio of CPU and memory. This means that if you have a long-running training job, then you lose performance over time as you spend the CPU credits. If you have very small jobs, then they are cost-effective. For example, if you want a notebook instance to launch training jobs, then this family is the most relevant and cost-effective.
  • The m family: In the previous family, we saw that CPU credits are consumed faster due to their burstable nature...

Securing SageMaker notebooks

If you are reading this section of the chapter, then you have already learned how to use notebook instances, which type of training instances should be chosen, and how to configure and use endpoints. Now, let's learn about securing those instances. The following aspects will help to secure the instances:

  • Encryption: When we say or think about securing via encryption, then it is all about the data. But what does this mean? It means protecting data at rest using encryption, protecting data in transit with encryption, and using KMS for better role separation and internet traffic privacy through TLS 1.2 encryption. SageMaker instances can be launched with encrypted volumes by using an AWS-managed KMS key. This helps you to secure the Jupyter Notebook server by default.
  • Root access: When a user opens a shell terminal from the Jupyter Web UI, they will be logged in as ec2-user, which is the default username in Amazon Linux. Now the user can run...

Creating alternative pipelines with Lambda Functions

Indeed, SageMaker is an awesome platform that you can use to create training and inference pipelines. However, we can always work with different services to come up with similar solutions. One of these services that we will learn about next is known as Lambda Functions.

AWS Lambda is a serverless compute service where you can literally run a function as a service. In other words, you can concentrate your efforts on just writing your function. Then, you just need to tell AWS how to run it (that is, the environment and resource configurations), so all the necessary resources will be provisioned to run your code and then discontinued once it is completed.

Throughout Chapter 6, AWS Services for Data Processing, you explored how Lambda Functions integrate with many different services, such as Kinesis and AWS Batch. Indeed, AWS did a very good job of integrating Lambda with 140 services (and the list is constantly increasing). That...

Working with Step Functions

Step Functions is an AWS service that allows you to create workflows in order to orchestrate the execution of Lambda Functions. This is so that you can connect them in a sort of event sequence, known as steps. These steps are grouped in a state machine.

Step Functions incorporates retry functionality so that you can configure your pipeline to proceed only after a particular step has succeeded. The way you set these retry configurations is by creating a retry policy.

Important note

Just like the majority of AWS services, AWS Step Functions also integrates with other services, not only AWS Lambda.

Creating a state machine is relatively simple. All you have to do is navigate to the AWS Step Functions console, then create a new state machine. On the Create state machine page, you can specify whether you want to create your state machine from scratch, from a template, or whether you just want to run a sample project.

AWS will help you with this...

Summary

In this chapter, we learned about the usage of SageMaker for creating notebook instances and training instances. As we went through we learned how to use SageMaker for hyperparameter tuning jobs. As the security of our assets in AWS is an essential part, we learned about the various ways to secure SageMaker instances. With hands-on practices, we created Step Functions and orchestrated our pipeline using AWS Lambda.

AWS products are evolving every day to help us solve our IT problems. It's not easy to remember all the product names. The only way to learn is through practice. When you're solving a problem or building a product, then focus on the different technological areas of your product. Those areas can be an AWS service, for example, scheduling jobs, logging, tracing, monitoring metrics, autoscaling, and more.

Compute time, storage, and networking are the baselines. It is recommended that you practice some examples for each of these services. Referring to...

Why subscribe?

  • Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
  • Improve your learning with Skill Plans built especially for you
  • Get a free eBook or video every month
  • Fully searchable for easy access to vital information
  • Copy and paste, print, and bookmark content

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at customercare@packtpub.com for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
AWS Certified Machine Learning Specialty: MLS-C01 Certification Guide
Published in: Mar 2021Publisher: PacktISBN-13: 9781800569003
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Somanath Nanda

Somanath has 10 years of working experience in IT industry which includes Prod development, Devops, Design and architect products from end to end. He has also worked at AWS as a Big Data Engineer for about 2 years.
Read more about Somanath Nanda

author image
Weslley Moura

Weslley Moura has been developing data products for the past decade. At his recent roles, he has been influencing data strategy and leading data teams into the urban logistics and blockchain industries.
Read more about Weslley Moura