Reader small image

You're reading from  Machine Learning Engineering on AWS

Product typeBook
Published inOct 2022
PublisherPackt
ISBN-139781803247595
Edition1st Edition
Tools
Right arrow
Author (1)
Joshua Arvin Lat
Joshua Arvin Lat
author image
Joshua Arvin Lat

Joshua Arvin Lat is the Chief Technology Officer (CTO) of NuWorks Interactive Labs, Inc. He previously served as the CTO for three Australian-owned companies and as director of software development and engineering for multiple e-commerce start-ups in the past. Years ago, he and his team won first place in a global cybersecurity competition with their published research paper. He is also an AWS Machine Learning Hero and has shared his knowledge at several international conferences, discussing practical strategies on machine learning, engineering, security, and management.
Read more about Joshua Arvin Lat

Right arrow

Machine Learning Pipelines with Kubeflow on Amazon EKS

In Chapter 9, Security, Governance, and Compliance Strategies, we discussed a lot of concepts and solutions that focus on the other challenges and issues we need to worry about when dealing with machine learning (ML) requirements. You have probably realized by now that ML practitioners have a lot of responsibilities and work to do outside model training and deployment! Once a model gets deployed into production, we would have to monitor the model and ensure that we are able to detect and manage a variety of issues. In addition to this, ML engineers might need to build ML pipelines to automate the different steps in the ML life cycle. To ensure that we reliably deploy ML models in production, as well as streamline the ML life cycle, it is best that we learn and apply the different principles of machine learning operations (MLOps). With MLOps, we will make use of the tried-and-tested tools and practices from software engineering...

Technical requirements

Before we start, it is important that we have the following ready:

  • A web browser (preferably Chrome or Firefox)
  • Access to the Cloud9 environment that was prepared in the Creating your Cloud9 environment and Increasing the Cloud9 storage sections of Chapter 1, Introduction to ML Engineering on AWS

The Jupyter notebooks, source code, and other files used for each chapter are available at this repository: https://github.com/PacktPublishing/Machine-Learning-Engineering-on-AWS.

Important Note

It is recommended that you use an IAM user with limited permissions instead of the root account when running the examples in this book. If you are just starting out with using AWS, you can proceed with using the root account in the meantime.

Diving deeper into Kubeflow, Kubernetes, and EKS

In Chapter 3, Deep Learning Containers, we learned that containers help guarantee the consistency of environments where applications can run. In the hands-on solutions of the said chapter, we worked with two containers—one container for training our deep learning model and another one for deploying the model. In larger applications, we will most likely encounter the usage of multiple containers running a variety of applications, databases, and automated scripts. Managing these containers is not easy and creating custom scripts to manage the uptime and scaling of the running containers is an overhead we wish to avoid. That said, it is recommended that you use a tool that helps you focus on what you need to accomplish. One of the available tools that can help us deploy, scale, and manage containerized applications is Kubernetes. This is an open source container orchestration system that provides a framework for running resilient...

Preparing the essential prerequisites

In this section, we will work on the following:

  • Preparing the IAM role for the EC2 instance of the Cloud9 environment
  • Attaching the IAM role to the EC2 instance of the Cloud9 environment
  • Updating the Cloud9 environment with the essential prerequisites

Let’s work on and prepare the essential prerequisites one by one.

Preparing the IAM role for the EC2 instance of the Cloud9 environment

In order for us to securely create and manage Amazon EKS and AWS CloudFormation resources from inside the EC2 instance of the Cloud9 environment, we would need to attach an IAM role to the EC2 instance. In this section, we will prepare this IAM role and configure it with the permissions required to create and manage the other resources in this chapter.

Note

We will discuss Amazon EKS and AWS CloudFormation in more detail in the Setting up Kubeflow on Amazon EKS section of this chapter.

In the next set of steps, we will...

Setting up Kubeflow on Amazon EKS

With all of the prerequisites ready, we can now proceed with creating our EKS cluster and then installing Kubeflow on top of it. During the installation and setup process, we will use the following tools:

  • eksctl – The CLI tool for creating and managing Amazon EKS clusters
  • kubectl – The CLI tool for creating, configuring, and deleting Kubernetes resources
  • AWS CLI – The CLI tool for creating, configuring, and deleting AWS resources
  • kustomize – The CLI tool for managing the configuration of Kubernetes objects

The hands-on portion of this section involves following a high-level set of steps:

  1. Preparing the eks.yaml file containing the EKS configuration (such as the number of nodes, desired capacity, and instance type)
  2. Running the eks create cluster command using the eks.yaml file to create the Amazon EKS cluster
  3. Using kustomize and kubectl to install Kubeflow inside our cluster
...

Running our first Kubeflow pipeline

In this section, we will run a custom pipeline that will download a sample tabular dataset and use it as training data to build our linear regression model. The steps and instructions to be executed by the pipeline have been defined inside a YAML file. Once this YAML file has been uploaded, we would then be able to run a Kubeflow pipeline that will run the following steps:

  1. Download dataset: Here, we will be downloading and working with a dataset that only has 20 records (along with the row containing the header). In addition to this, we will start with a clean version without any missing or invalid values:

Figure 10.16 – A sample tabular dataset

In Figure 10.16, we can see that our dataset has three columns:

  • last_name – This is the last name of the manager.
  • management_experience_months – This is the total number of months a manager has been managing team members.
  • monthly_salary...

Using the Kubeflow Pipelines SDK to build ML workflows

In this section, we will build ML workflows using the Kubeflow Pipelines SDK. The Kubeflow Pipelines SDK contains what we need to build the pipeline components containing the custom code we want to run. Using the Kubeflow Pipelines SDK, we can define the Python functions that would map to the pipeline components of a pipeline.

Here are some guidelines that we need to follow when building Python function-based components using the Kubeflow Pipelines SDK:

  • The defined Python functions should be standalone and should not use any code and variables declared outside of the function definition. This means that import statements (for example, import pandas) should be implemented inside the function, too. Here’s a quick example of how imports should be implemented:
    def process_data(...):
        import pandas as pd    
        df_all_data = pd.read_csv(df_all_data_path...

Cleaning up

Now that we have completed working on the hands-on solutions of this chapter, it is time we clean up and turn off the resources we will no longer use. At this point in time, we have an EKS cluster running with 5 x m5.xlarge instances running. We need to terminate these resources to manage the cost.

Note

If we do not turn these off (for a month), how much would it cost? At a minimum (per month), it would cost around USD 700.80 for the running EC2 instances (5 instances x 0.192 USD x 730 hours in a month) plus USD 73 for the EKS cluster (1 Cluster x 0.10 USD per hour x 730 hours per month) assuming that we are running the EKS cluster in the Oregon region (us-west-2). Note that there will be other additional costs associated with the EBS volumes attached to these instances along with the other resources used in this chapter.

In the next set of steps, we will uninstall and delete the resources in the Cloud9 environment’s Terminal:

  1. Let’s navigate...

Summary

In this chapter, we set up and configured our containerized ML environment using Kubeflow, Kubernetes, and Amazon EKS. After setting up the environment, we then prepared and ran a custom ML pipeline using the Kubeflow Pipelines SDK. After completing all the hands-on work needed, we proceeded with cleaning up the resources we created. Before ending the chapter, we discussed relevant best practices and strategies to secure, scale, and manage ML pipelines using the technology stack we used in the hands-on portion of this chapter.

In the next chapter, we will build and set up an ML pipeline using SageMaker PipelinesAmazon SageMaker’s purpose-built solution for automating ML workflows using relevant MLOps practices.

Further reading

For more information on the topics covered in this chapter, feel free to check out the following resources:

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Machine Learning Engineering on AWS
Published in: Oct 2022Publisher: PacktISBN-13: 9781803247595
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Joshua Arvin Lat

Joshua Arvin Lat is the Chief Technology Officer (CTO) of NuWorks Interactive Labs, Inc. He previously served as the CTO for three Australian-owned companies and as director of software development and engineering for multiple e-commerce start-ups in the past. Years ago, he and his team won first place in a global cybersecurity competition with their published research paper. He is also an AWS Machine Learning Hero and has shared his knowledge at several international conferences, discussing practical strategies on machine learning, engineering, security, and management.
Read more about Joshua Arvin Lat