Reader small image

You're reading from  AWS Certified Machine Learning - Specialty (MLS-C01) Certification Guide - Second Edition

Product typeBook
Published inFeb 2024
PublisherPackt
ISBN-139781835082201
Edition2nd Edition
Right arrow
Authors (2):
Somanath Nanda
Somanath Nanda
author image
Somanath Nanda

Somanath has 10 years of working experience in IT industry which includes Prod development, Devops, Design and architect products from end to end. He has also worked at AWS as a Big Data Engineer for about 2 years.
Read more about Somanath Nanda

Weslley Moura
Weslley Moura
author image
Weslley Moura

Weslley Moura has been developing data products for the past decade. At his recent roles, he has been influencing data strategy and leading data teams into the urban logistics and blockchain industries.
Read more about Weslley Moura

View More author details
Right arrow

Amazon SageMaker Modeling

In the previous chapter, you learned several methods of model optimization and evaluation techniques. You also learned various ways of storing data, processing data, and applying different statistical approaches to data. So, how can you now build a pipeline for this? Well, you can read data, process data, and build machine learning (ML) models on the processed data. But what if my first ML model does not perform well? Can I fine-tune my model? The answer is yes; you can do nearly everything using Amazon SageMaker. In this chapter, you will walk you through the following topics using Amazon SageMaker:

  • Understanding different instances of Amazon SageMaker
  • Cleaning and preparing data in Jupyter Notebook in Amazon SageMaker
  • Model training in Amazon SageMaker
  • Using SageMaker’s built-in ML algorithms
  • Writing custom training and inference code in SageMaker

Technical requirements

Creating notebooks in Amazon SageMaker

If you are working with ML, then you need to perform actions such as storing data, processing data, preparing data for model training, model training, and deploying the model for inference. They are complex, and each of these stages requires a machine to perform the task. With Amazon SageMaker, life becomes much easier when carrying out these tasks.

What is Amazon SageMaker?

SageMaker provides training instances to train a model using the data and provides endpoint instances to infer by using the model. It also provides notebook instances running on the Jupyter Notebook to clean and understand the data. If you are happy with your cleaning process, then you should store the cleaned data in S3 as part of the staging for training. You can launch training instances to consume this training data and produce an ML model. The ML model can be stored in S3, and endpoint instances can consume the model to produce results for end users.

If you draw...

Model tuning

In Chapter 7, Evaluating and Optimizing Models, you learned many important concepts about model tuning. Let’s now explore this topic from a practical perspective.

In order to tune a model on SageMaker, you have to call create_hyper_parameter_tuning_job and pass the following main parameters:

  • HyperParameterTuningJobName: This is the name of the tuning job. It is useful to track the training jobs that have been started on behalf of your tuning job.
  • HyperParameterTuningJobConfig: Here, you can configure your tuning options. For example, which parameters you want to tune, the range of values for them, the type of optimization (such as random search or Bayesian search), the maximum number of training jobs you want to spin up, and more.
  • TrainingJobDefinition: Here, you can configure your training job. For example, the data channels, the output location, the resource configurations, the evaluation metrics, and the stop conditions.

In SageMaker...

Choosing instance types in Amazon SageMaker

SageMaker uses a pay-for-usage model. There is no minimum fee for it.

When you think about instances on SageMaker, it all starts with an EC2 instance. This instance is responsible for all your processing. It’s a managed EC2 instance. These instances won’t show up in the EC2 console and cannot be SSHed either. The names of this instance type start with ml.

SageMaker offers instances of the following families:

  • The t family: This is the burstable CPU family. With this family, you get a balanced ratio of CPU and memory. This means that if you have a long-running training job, then you lose performance over time as you spend the CPU credits. If you have very small jobs, then they are cost-effective. For example, if you want a notebook instance to launch training jobs, then this family is the most appropriate and cost-effective.
  • The m family: In the previous family, you saw that CPU credits are consumed faster due...

Taking care of Scalability Configurations

To kickstart auto scaling for your model, you can take advantage of the SageMaker console, AWS Command Line Interface (AWS CLI), or an AWS SDK through the Application Auto Scaling API. For those inclined towards the CLI or API, the process involves registering the model as a scalable target, defining the scaling policy, and then applying it. If you opt for the SageMaker console, simply navigate to Endpoints under Inference in the navigation pane, locate your model’s endpoint name, and choose it along with the variant name to activate auto scaling.

Let’s now dive into the intricacies of scaling policies.

Scaling Policy Overview

Auto scaling is driven by scaling policies, which determine how instances are added or removed in response to varying workloads. Two options are at your disposal: target tracking and step scaling policies.

Target Tracking Scaling Policies: Our recommendation is to leverage target tracking scaling...

Securing SageMaker notebooks

If you are reading this section of the chapter, then you have already learned how to use notebook instances, which type of training instances should be chosen, and how to configure and use endpoints. Now, let’s learn about securing those instances. The following aspects will help to secure the instances:

  • Encryption: When you talk about securing something via encryption, you are talking about safeguarding data. But what does this mean? It means protecting data at rest using encryption, protecting data in transit with encryption, and using KMS for better role separation and internet traffic privacy through TLS 1.2 encryption. SageMaker instances can be launched with encrypted volumes by using an AWS-managed KMS key. This helps you to secure the Jupyter Notebook server by default.
  • Root access: When a user opens a shell terminal from the Jupyter Web UI, they will be logged in as ec2-user, which is the default username in Amazon Linux. Now...

SageMaker Debugger

In this section, you will learn about Amazon SageMaker Debugger, unraveling the intricacies of monitoring, profiling, and debugging ML model training:

  • Monitoring and profiling: SageMaker Debugger captures model metrics and keeps a real-time eye on system resources during training, eliminating the need for additional code. It not only provides a window into the training process but empowers instant issue correction, expediting training and elevating model quality.
  • Automatic detection and analysis: A true time-saver, Debugger automatically spots and notifies you of common training errors, such as oversized or undersized gradient values. Say goodbye to days of troubleshooting; Debugger reduces it to mere hours.
  • Profiling capabilities: Venture into the realm of profiling with Debugger, which meticulously monitors system resource utilization metrics and allows you to profile training jobs. This involves collecting detailed metrics from your ML framework...

SageMaker Autopilot

ML model development has historically been a daunting task, demanding considerable expertise and time. Amazon SageMaker Autopilot emerges as a game-changer, simplifying this intricate process and transforming it into a streamlined experience.

Amazon SageMaker Autopilot presents a rich array of features to facilitate the development of ML models:

  • Automatic model building: SageMaker Autopilot removes the complexities of constructing ML models by taking charge and automating the entire process with a simple mandate from the user: provide a tabular dataset and designate the target column for prediction.
  • Data processing and enhancement: Autopilot seamlessly handles data preprocessing tasks, filling in missing data, offering statistical insights into dataset columns, and extracting valuable information from non-numeric columns. This guarantees that input data is finely tuned for model training.
  • Problem type detection: Autopilot showcases intelligence...

SageMaker Model Monitor

In the ever-evolving realm of ML, ensuring the reliability and robustness of models in real-world production settings is paramount. In this section, you will delve into the profound significance, practical applications, and potent features of Amazon SageMaker Model Monitor—an instrumental component tailored to tackle the challenge of model drift in live production environments:

  • The essence of model monitoring: As ML models venture into real-world deployment, the ongoing degradation of their effectiveness—attributed to shifts in data distributions or alterations in user behavior—poses a substantial threat known as model drift. Continuous monitoring becomes the linchpin for proactively identifying and rectifying these deviations, safeguarding the accuracy and reliability of ML predictions and, consequently, business outcomes.
  • An automated guardian: Amazon SageMaker Model Monitor emerges as a guiding light in the ML landscape, delivering...

SageMaker Training Compiler

If you’ve reached this section, you are about to delve into the world of SageMaker Training Compiler (SMTC), a game-changing tool designed to supercharge the training of your ML models on SageMaker by optimizing intricate training scripts. Picture this: faster training, swifter model development, and an open door to experimentation. That’s the primary goal of SMTC—improving training speed to bring agility to your model development journey. The following are the major advantages of using SMTC:

  • Scaling challenges: Embarking on the journey of training large-scale models, especially those with billions of parameters, often feels like navigating uncharted engineering territory. SMTC, however, rises to the occasion by optimizing the entire training process, conquering the challenges that come with scaling.
  • Efficiency at its core: SMTC takes the reins of GPU memory usage, ushering in a realm where larger batch sizes become not just...

SageMaker Data Wrangler

In this section, you’ll unravel the significance and benefits of Data Wrangler, dissecting its role as an end-to-end solution for importing, preparing, transforming, featurizing, and analyzing data:

  • Importing data with ease: Data Wrangler simplifies the process of importing data from various sources, such as Amazon Simple Storage Service (S3), Amazon Athena, Amazon Redshift, Snowflake, and Databricks. Whether your data resides in the cloud or within specific databases, Data Wrangler seamlessly connects to the source and imports it, setting the stage for comprehensive data handling.
  • Constructing data flows: Picture a scenario where you can effortlessly design a data flow, mapping out a sequence of ML data preparation steps. This is where Data Wrangler shines. By combining datasets from diverse sources and specifying the transformations needed, you sculpt a data prep workflow ready to integrate into your ML pipeline.
  • Transforming data with...

SageMaker Feature Store

Imagine you are building a recommendation system. In the absence of Feature Store, you’d navigate a landscape of manual feature engineering, scattered feature storage, and constant vigilance for consistency.

Feature management in an ML pipeline is challenging due to the dispersed nature of feature engineering, involving various teams and tools. Collaboration issues arise when different teams handle different aspects of feature storage, leading to inconsistencies and versioning problems. The dynamic nature of features evolving over time complicates change tracking and ensuring reproducibility. SageMaker Feature Store addresses these challenges by providing a centralized repository for features, enabling seamless sharing, versioning, and consistent access across the ML pipeline, thus simplifying collaboration, enhancing reproducibility, and promoting data consistency.

Now, user data, including age, location, browsing history, and item data such as...

SageMaker Edge Manager

SageMaker Edge Manager is designed to address the challenges faced by ML developers when operating models on fleets of edge devices. Some of the key functions that SageMaker Edge Manager can perform are highlighted as follows:

  • Model compilation: Utilizes Amazon SageMaker Neo to compile models for various target devices and operating environments, including Linux, Windows, Android, iOS, and macOS.
  • Model deployment: Signs each model with an AWS key, packages it with its runtime, and includes all necessary credentials for deployment on specific devices.
  • Model server concept: Introduces a model server concept to efficiently run multiple models on edge devices, optimizing hardware resource utilization.
  • Continuous monitoring: Provides tools for continuous monitoring of model health, allowing developers to collect metrics, sample input/output data, and send this data securely to the cloud.
  • Model drift detection: Allows the detection of model...

SageMaker Canvas

In this section, you will learn the core of SageMaker Canvas, elucidating its features and the significance it holds for organizations keen on infusing ML into their decision-making processes.

Amazon SageMaker Canvas is a cloud-based service offered by AWS that streamlines the ML process through a visual interface for constructing, training, and deploying ML models—all without the need for coding. Nestled within the Amazon SageMaker suite, it caters to a diverse audience by democratizing ML:

  • Code-free model building: SageMaker Canvas obliterates the traditional barriers encountered when adopting ML, enabling users to forge models without the need for code. This feature proves pivotal for business professionals seeking to harness the potency of ML for predictive analytics, despite lacking coding expertise.

    Case study: A marketing professional without any ML knowledge can utilize SageMaker Canvas to predict customer churn. The intuitive interface guides...

Summary

In this chapter, you learned about the usage of SageMaker for creating notebook instances and training instances. As you went through, you learned how to use SageMaker for hyperparameter tuning jobs. As the security of your assets in AWS is an essential part of your work, you also learned the various ways to secure SageMaker instances.

AWS products are evolving every day to help you solve IT problems. It’s not easy to remember all the product names. The only way to learn is through practice. When you are solving a problem or building a product, focus on different technological areas of your product. Those areas can be scheduling jobs, logging, tracing, monitoring metrics, autoscaling, and more.

Compute time, storage, and networking are the baselines. It is recommended that you practice some examples for each of these services. Referring to the AWS documentation to resolve any doubts is also a useful option. It is always important to design your solutions in a cost...

Exam Readiness Drill – Chapter Review Questions

Apart from a solid understanding of key concepts, being able to think quickly under time pressure is a skill that will help you ace your certification exam. That is why working on these skills early on in your learning journey is key.

Chapter review questions are designed to improve your test-taking skills progressively with each chapter you learn and review your understanding of key concepts in the chapter at the same time. You’ll find these at the end of each chapter.

How To Access These Resources

To learn how to access these resources, head over to the chapter titled Chapter 11, Accessing the Online Practice Resources.

To open the Chapter Review Questions for this chapter, perform the following steps:

  1. Click the link – https://packt.link/MLSC01E2_CH09.

    Alternatively, you can scan the following QR code (Figure 9.14):

Figure 9.14 – QR code that opens Chapter Review Questions for logged-in users

Figure 9.14 – QR code that opens Chapter...

Working On Timing

Target: Your aim is to keep the score the same while trying to answer these questions as quickly as possible. Here’s an example of how your next attempts should look like:

Attempt

Score

Time Taken

Attempt 5

77%

21 mins 30 seconds

Attempt 6

78%

18 mins 34 seconds

Attempt 7

76%

14 mins 44 seconds

Table 9.3 – Sample timing practice drills on the online platform

Note

The time limits shown in the above table are just examples. Set your own time limits with each attempt based on the time limit of the quiz on the website.

With each new attempt, your score should stay above 75% while your “time taken...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
AWS Certified Machine Learning - Specialty (MLS-C01) Certification Guide - Second Edition
Published in: Feb 2024Publisher: PacktISBN-13: 9781835082201
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Somanath Nanda

Somanath has 10 years of working experience in IT industry which includes Prod development, Devops, Design and architect products from end to end. He has also worked at AWS as a Big Data Engineer for about 2 years.
Read more about Somanath Nanda

author image
Weslley Moura

Weslley Moura has been developing data products for the past decade. At his recent roles, he has been influencing data strategy and leading data teams into the urban logistics and blockchain industries.
Read more about Weslley Moura