Reader small image

You're reading from  Machine Learning Engineering on AWS

Product typeBook
Published inOct 2022
PublisherPackt
ISBN-139781803247595
Edition1st Edition
Tools
Right arrow
Author (1)
Joshua Arvin Lat
Joshua Arvin Lat
author image
Joshua Arvin Lat

Joshua Arvin Lat is the Chief Technology Officer (CTO) of NuWorks Interactive Labs, Inc. He previously served as the CTO for three Australian-owned companies and as director of software development and engineering for multiple e-commerce start-ups in the past. Years ago, he and his team won first place in a global cybersecurity competition with their published research paper. He is also an AWS Machine Learning Hero and has shared his knowledge at several international conferences, discussing practical strategies on machine learning, engineering, security, and management.
Read more about Joshua Arvin Lat

Right arrow

Model Monitoring and Management Solutions

In Chapter 6, SageMaker Training and Debugging Solutions, and Chapter 7, SageMaker Deployment Solutions, we focused on training and deploying machine learning (ML) models using SageMaker. If you were able to complete the hands-on solutions presented in those chapters, you should be able to perform similar types of experiments and deployments using other algorithms and datasets. These two chapters are good starting points, especially when getting started with the managed service. At some point, however, you will have to use its other capabilities to manage, troubleshoot, and monitor different types of resources in production ML environments.

One of the clear advantages of using SageMaker is that a lot of the commonly performed tasks of data scientists and ML practitioners have already been automated as part of this fully managed service. This means that we generally do not need to build a custom solution, especially if SageMaker already has...

Technical prerequisites

Before we start, we must have the following ready:

  • A web browser (preferably Chrome or Firefox)
  • Access to the AWS account and SageMaker Studio domain that was used in the first chapter of this book

The Jupyter notebooks, source code, and other files used for each chapter are available in this book’s GitHub repository: https://github.com/PacktPublishing/Machine-Learning-Engineering-on-AWS.

Important Note

It is recommended to use an IAM user with limited permissions instead of the root account when running the examples in this book. We will discuss this, along with other security best practices, in detail in Chapter 9, Security, Governance, and Compliance Strategies. If you are just starting to use AWS, you may proceed with using the root account in the meantime.

Registering models to SageMaker Model Registry

In Chapter 6, SageMaker Training and Debugging Solutions, we used the deploy() method of the Estimator instance to immediately deploy our ML model to an inference endpoint right after using the fit() method to train the model. When performing ML experiments and deployments in production, a model may have to be analyzed and evaluated first before proceeding with the deployment step. The individual or team performing the analysis would review the input configuration parameters, the training data, and the algorithm used to train the model, along with other relevant information available. Once the data science team has to work with multiple models, managing and organizing all of these would be much easier using a model registry.

What’s a model registry? A model registry is simply a repository that focuses on helping data scientists and ML practitioners manage, organize, and catalog ML models. After the training step, the data science...

Deploying models from SageMaker Model Registry

There are many possible next steps available after an ML model has been registered to a model registry. In this section, we will focus on deploying the first registered ML model (pre-trained K-Nearest Neighbor model) manually to a new inference endpoint. After the first registered ML model has been deployed, we will proceed with deploying the second registered model (pre-trained Linear Learner model) in the same endpoint where the first ML model has been deployed, similar to what’s shown in the following diagram:

Figure 8.7 – Deploying models from the model registry

Here, we can see that we can directly replace the deployed ML model inside a running ML inference endpoint without creating a new separate inference endpoint. This means that we do not need to worry about changing the “target infrastructure server” in our setup since the model replacement operation is happening behind the...

Enabling data capture and simulating predictions

After an ML model has been deployed to an inference endpoint, its quality needs to be monitored and checked so that we can easily perform corrective actions whenever quality issues or deviations are detected. This is similar to web application development, where even if the quality assurance team has already spent days (or weeks) testing the final build of the application, there can still be other issues that would only be detected once the web application is running already:

Figure 8.8 – Capturing the request and response data of the ML inference endpoint

As shown in the preceding diagram, model monitoring starts by capturing the request and response data, which passes through a running ML inference endpoint. This collected data is processed and analyzed in a later step using a separate automated task or job that can generate reports and flag issues or anomalies. If we deployed our ML model in a custom...

Scheduled monitoring with SageMaker Model Monitor

If you have been working in the data science and ML industry for quite some time, you probably know that an ML model’s performance after deployment is not guaranteed. Deployed models in production must be monitored in real time (or near-real time) so that we can potentially replace the deployed model and fix any issues once any drift or deviation from the expected set of values is detected:

Figure 8.9 – Analyzing captured data and detecting violations using Model Monitor

In the preceding diagram, we can see that we can process and analyze the captured data through a monitoring (processing) job. This job is expected to generate an automated report that can be used to analyze the deployed model and the data. At the same time, any detected violations are flagged and reported as part of the report.

Note

Let’s say that we have trained an ML model that predicts a professional’s salary...

Analyzing the captured data

Of course, there are other ways to process the data that’s been captured and stored inside the S3 bucket. Instead of using the built-in model monitoring capabilities and features discussed in the previous section, we can also download the collected ML inference endpoint data from the S3 bucket and analyze it directly in a notebook.

Note

It is still recommended to utilize the built-in model monitoring capabilities and features of SageMaker. However, knowing this approach would help us troubleshoot any issues we may encounter while using and running the automated solutions available in SageMaker.

Follow these steps to use a variety of Python libraries to process, clean, and analyze the collected ML inference data in S3:

  1. Create a new Notebook by clicking the File menu and choosing Notebook from the list of options under the New submenu.

Note

Note that we will be creating the new notebook inside the CH08 directory beside the...

Deleting an endpoint with a monitoring schedule

Now that we are done using our ML inference endpoint, let’s delete it, along with the attached monitors and monitoring schedules.

Follow these steps to list all the attached monitors of our ML inference endpoint and delete any attached monitoring schedules, along with the endpoint:

  1. Create a new Notebook by clicking the File menu and choosing Notebook from the list of options under the New submenu.

Note

Note that we will be creating the new notebook inside the CH08 directory beside the other notebook files we created in the previous sections of this chapter.

  1. In the Set up notebook environment window, specify the following configuration values:
    • Image: Data Science (option found under SageMaker image)
    • Kernel: Python 3
    • Start-up script: No script

Click the Select button afterward.

  1. Right-click on the tab name of the new Notebook and select Rename Notebook… from the list of options in...

Cleaning up

Now that we have finished working on the hands-on solutions of this chapter, it is time we clean up and turn off any resources we will no longer use. Follow these steps to locate and turn off any remaining running instances in SageMaker Studio:

  1. Click the Running Instances and Kernels icon in the sidebar, as highlighted in the following screenshot:

Figure 8.17 – Turning off the running instance

Clicking the Running Instances and Kernels icon should open and show the running instances, apps, and terminals in SageMaker Studio.

  1. Turn off all running instances under RUNNING INSTANCES by clicking the Shutdown button for each of the instances, as highlighted in the preceding screenshot. Clicking the Shutdown button will open a popup window verifying the instance shutdown operation. Click the Shut down all button to proceed.

Important Note

Make sure that you close the open notebook tabs in the Editor pane. In some cases...

Summary

In this chapter, we utilized the model registry available in SageMaker to register, organize, and manage our ML models. After deploying ML models stored in the registry, we used SageMaker Model Monitor to capture data and run processing jobs that analyze the collected data and flag any detected issues or deviations.

In the next chapter, we will focus on securing ML environments and systems using a variety of strategies and solutions. If you are serious about designing and building secure ML systems and environments, then the next chapter is for you!

Further reading

For more information on the topics that were covered in this chapter, feel free to check out the following resources:

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Machine Learning Engineering on AWS
Published in: Oct 2022Publisher: PacktISBN-13: 9781803247595
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Joshua Arvin Lat

Joshua Arvin Lat is the Chief Technology Officer (CTO) of NuWorks Interactive Labs, Inc. He previously served as the CTO for three Australian-owned companies and as director of software development and engineering for multiple e-commerce start-ups in the past. Years ago, he and his team won first place in a global cybersecurity competition with their published research paper. He is also an AWS Machine Learning Hero and has shared his knowledge at several international conferences, discussing practical strategies on machine learning, engineering, security, and management.
Read more about Joshua Arvin Lat