Reader small image

You're reading from  Machine Learning Engineering on AWS

Product typeBook
Published inOct 2022
PublisherPackt
ISBN-139781803247595
Edition1st Edition
Tools
Right arrow
Author (1)
Joshua Arvin Lat
Joshua Arvin Lat
author image
Joshua Arvin Lat

Joshua Arvin Lat is the Chief Technology Officer (CTO) of NuWorks Interactive Labs, Inc. He previously served as the CTO for three Australian-owned companies and as director of software development and engineering for multiple e-commerce start-ups in the past. Years ago, he and his team won first place in a global cybersecurity competition with their published research paper. He is also an AWS Machine Learning Hero and has shared his knowledge at several international conferences, discussing practical strategies on machine learning, engineering, security, and management.
Read more about Joshua Arvin Lat

Right arrow

Deep Learning AMIs

In the Essential prerequisites section of Chapter 1, Introduction to ML Engineering on AWS, it probably took us about an hour or so to set up our Cloud9 environment. We had to spend a bit of time installing several packages, along with a few dependencies, before we were able to work on the actual machine learning (ML) requirements. On top of this, we had to make sure that we were using the right versions for certain packages to avoid running into a variety of issues. If you think this is error-prone and tedious, imagine being given the assignment of preparing 20 ML environments for a team of data scientists! Let me repeat that… TWENTY! It would have taken us around 15 to 20 hours of doing the same thing over and over again. After a week of using the ML environments you prepared, the data scientists then requested that you also install the deep learning frameworks TensorFlow, PyTorch, and MXNet inside these environments since they’ll be testing different...

Technical requirements

Before we start, we must have a web browser (preferably Chrome or Firefox) and an AWS account to use for the hands-on solutions in this chapter. Make sure that you have access to the AWS account you used in Chapter 1, Introduction to ML Engineering on AWS.

The Jupyter notebooks, source code, and other files used for each chapter are available in this book’s GitHub repository: https://github.com/PacktPublishing/Machine-Learning-Engineering-on-AWS.

Getting started with Deep Learning AMIs

Before we talk about DLAMIs, we must have a good idea of what AMIs are. We can think of an AMI as the “DNA” of an organism. Using this analogy, the organism would correspond and map to one or more EC2 instances:

Figure 2.1 – Launching EC2 instances using Deep Learning AMIs

If we were to launch two EC2 instances using the same AMI (similar to what is shown in Figure 2.1), both instances would have the same set of installed packages, frameworks, tools, and operating systems upon instance launch. Of course, not everything needs to be the same as these instances may have different instance types, different security groups, and other configurable properties.

AMIs allow engineers to easily launch EC2 instances in consistent environments without having to spend hours installing different packages and tools. In addition to the installation steps, these EC2 instances need to be configured and optimized...

Launching an EC2 instance using a Deep Learning AMI

Launching an EC2 instance from a DLAMI is straightforward. Once we have an idea of which DLAMI to use, the rest of the steps would just be focused on configuring and launching the EC2 instance. The cool thing here is that we are not limited to launching a single instance from an existing image. During the configuration stage, before an instance is launched from an AMI, it is important to note that we can specify the desired value for the number of instances to be launched (for example, 20). This would mean that instead of launching a single instance, we would launch 20 instances all at the same time instead.

Figure 2.2 – Steps to launch an EC2 instance using a DLAMI

We will divide this section into four parts. As shown in the preceding diagram, we’ll start by locating the framework-specific Deep Learning AMI in the AMI Catalog – a repository that contains a variety of AMIs that can be...

Downloading the sample dataset

In the succeeding sections of this chapter, we will work with a very simple synthetic dataset that contains only two columns – x and y. Here, x may represent an object’s relative position on the X-axis, while y may represent the same object’s position on the Y-axis. The following screenshot shows an example of what the data looks like:

Figure 2.20 – Sample dataset

ML is about finding patterns. With this dataset, we will build a model that tries to predict the value of y given the value of x later in this chapter. Once we’re able to build models with a simple example like this, it will be much easier to deal with more realistic datasets that contain more than two columns, similar to what we worked with in Chapter 1, Introduction to ML Engineering on AWS.

Note

In this book, we won’t limit ourselves to just tabular data and simple datasets. In Chapter 6, SageMaker Training and Debugging...

Training an ML model

In Chapter 1, Introduction to ML Engineering on AWS, we trained a binary classifier model that aims to predict if a hotel booking will be canceled or not using the available information. In this chapter, we will use the (intentionally simplified) dataset from Downloading the Sample Dataset and train a regression model that will predict the value of y (continuous variable) given the value of x. Instead of relying on ready-made AutoML tools and services, we will be working with a custom script instead:

Figure 2.23 – Model life cycle

When writing a custom training script, we usually follow a sequence similar to what is shown in the preceding diagram. We start by defining and compiling a model. After that, we load the data and use it to train and evaluate the model. Finally, we serialize and save the model into a file.

Note

What happens after the model has been saved? The model file can be used and loaded in an inference endpoint...

Loading and evaluating the model

In the previous section, we trained our deep learning model using the terminal. When performing ML experiments, it is generally more convenient to use a web-based interactive environment such as the Jupyter Notebook. We can technically run all the succeeding code blocks in the terminal, but we will use the Jupyter Notebook instead for convenience.

In the next set of steps, we will launch the Jupyter Notebook from the command line. Then, we will run a couple of blocks of code to load and evaluate the ML model we trained in the previous section. Let’s get started:

  1. Continuing where we left off in the Training an ML model section, let’s run the following command in the EC2 Instance Connect terminal:
    jupyter notebook --allow-root --port 8888 --ip 0.0.0.0

This should start the Jupyter Notebook and make it accessible through port 8888:

Figure 2.31 – Jupyter Notebook token

Make sure that you copy...

Cleaning up

Now that we have completed an end-to-end ML experiment, it’s about time we perform the cleanup steps to help us manage costs:

  1. Close the browser tab that contains the EC2 Instance Connect terminal session.
  2. Navigate to the EC2 instance page of the instance we launched using the Deep Learning AMI. Click Instance state to open the list of dropdown options and then click Terminate instance:

Figure 2.37 – Terminating the instance

As we can see, there are other options available, such as Stop instance and Reboot instance. If you do not want to delete the instance yet, you may want to stop the instance instead and start it at a later date and time. Note that a stopped instance will incur costs since the attached EBS volume is not deleted when an EC2 instance is stopped. That said, it is preferable to terminate the instance and delete any attached EBS volume if there are no critical files stored in the EBS volume.

  1. In...

Understanding how AWS pricing works for EC2 instances

Before we end this chapter, we must have a good idea of how AWS pricing works when dealing with EC2 instances. We also need to understand how the architecture and setup affect the overall cost of running ML workloads in the cloud.

Let’s say that we initially have a single p2.xlarge instance running 24/7 for an entire month in the Oregon region. Inside this instance, the data science team regularly runs a script that trains a deep learning model using the preferred ML framework. This training script generally runs for about 3 hours twice every week. Given the unpredictable schedule of the availability of new data, it’s hard to know when the training script will be run to produce a new model. The resulting ML model then gets deployed immediately to a web API server, which serves as the inference endpoint within the same instance. Given this information, how much would the setup cost?

Figure 2...

Summary

In this chapter, we were able to launch an EC2 instance using a Deep Learning AMI. This allowed us to immediately have an environment where we can perform our ML experiments without worrying about the installation and setup steps. We then proceeded with using TensorFlow to train and evaluate our deep learning model to solve a regression problem. We wrapped up this chapter by having a short discussion on how AWS pricing works for EC2 instances.

In the next chapter, we will focus on how AWS Deep Learning Containers help significantly speed up the ML experimentation and deployment process.

Further reading

We are only scratching the surface of what we can do with Deep Learning AMIs. In addition to the convenience of having preinstalled frameworks, DLAMIs make it easy for ML engineers to utilize other optimization solutions such as AWS Inferentia, AWS Neuron, distributed training, and Elastic Fabric Adapter. For more information, feel free to check out the following resources:

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Machine Learning Engineering on AWS
Published in: Oct 2022Publisher: PacktISBN-13: 9781803247595
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Joshua Arvin Lat

Joshua Arvin Lat is the Chief Technology Officer (CTO) of NuWorks Interactive Labs, Inc. He previously served as the CTO for three Australian-owned companies and as director of software development and engineering for multiple e-commerce start-ups in the past. Years ago, he and his team won first place in a global cybersecurity competition with their published research paper. He is also an AWS Machine Learning Hero and has shared his knowledge at several international conferences, discussing practical strategies on machine learning, engineering, security, and management.
Read more about Joshua Arvin Lat