You're reading from AWS Certified Machine Learning Specialty: MLS-C01 Certification Guide

Product typeBook

Published inMar 2021

Reading LevelBeginner

PublisherPackt

ISBN-139781800569003

Edition1st Edition

Languages

Python

Tools

Azure Functions

Concepts

Machine Learning

Authors (2):

Somanath Nanda

Weslley Moura

View More author details

Chapter 7: Applying Machine Learning Algorithms

In the previous chapter, we studied AWS services for data processing, including Glue, Athena, and Kinesis! It is now time to move on to the modeling phase and study machine learning algorithms. I am sure that, during the earlier chapters, you have realized that building machine learning models requires a lot of knowledge about AWS services, data engineering, data exploratory, data architecture, and much more. This time, we will go deeper into the algorithms that we have been talking about so far and many others.

Having a good sense of the different types of algorithms and machine learning approaches will put you in a very good position to make decisions during your projects. Of course, this type of knowledge is also crucial to the AWS machine learning specialty exam.

Bear in mind that there are thousands of algorithms out there and, by the way, you can even propose your own algorithm for a particular problem. Furthermore, we will...

Introducing this chapter

During this chapter, we will talk about several algorithms, modeling concepts, and learning strategies. We think all these topics will be beneficial for you during the exam and your data scientist career.

We have structured this chapter in a way so that it covers not only the necessary topics of the exam but also gives you a good sense of the most important learning strategies out there. For example, the exam will check your knowledge regarding the basic concepts of K-means; however, we will cover it on a much deeper level, since this is an important topic for your career as a data scientist.

We will follow this approach, looking deeper into the logic of the algorithm, for some types of models that we feel every data scientist should master. So, keep that in mind: sometimes, we might go deeper than expected in the exam, but that will be extremely important for you.

Many times, during this chapter, we will use the term built-in algorithms. We will use...

Storing the training data

First of all, you can use multiple AWS services to prepare data for machine learning, such as EMR, Redshift, Glue, and so on. After preprocessing the training data, you should store it in S3, in a format expected by the algorithm you are using. The following table shows the list of acceptable data formats per algorithm:

Figure 7.1 – Data formats that are acceptable per AWS algorithm

As we can see, many algorithms accept text/.csv format. Keep in mind that you should follow these rules if you want to use that format:

Your CSV file can't have a header record.
For supervised learning, the target variable must be in the first column.
While configuring the training pipeline, set the input data channel as content_type equal to text/csv.
For unsupervised learning, set the label_size within the content_type to 'content_type=text/csv;label_size=0'.

Although text/.csv format is fine for many use...

A word about ensemble models

Before we start diving into the algorithms, there is an important modeling concept that you should be aware of, known as ensemble. The term ensemble is used to describe methods that use multiple algorithms to create a model.

For example, instead of creating just one model to predict fraudulent transactions, you could create multiple models that do the same thing and, using a vote sort of system, select the predicted outcome. The following table illustrates this simple example:

Figure 7.2 – An example of a voting system on ensemble methods

The same approach works for regression problems, where, instead of voting, we could average the results of each model and use that as the final outcome.

Voting and averaging are just two examples of ensemble approaches. Other powerful techniques include blending and stacking, where you can create multiple models and use the outcome of each model as features for a main model. Looking...

Supervised learning

AWS provides supervised learning algorithms for general purposes (regression and classification tasks) and for more specific purposes (forecasting and vectorization). The list of built-in algorithms that can be found in these sub-categories is as follows:

Linear learner algorithm
Factorization machines algorithm
XGBoost algorithm
K-Nearest Neighbor algorithm
Object2Vec algorithm
DeepAR Forecasting algorithm

Let's start with regression models and the linear learner algorithm.

Working with regression models

Okay; I know that real problems usually aren't linear nor simple. However, looking into linear regression models is a nice way to figure out what's going on inside regression models in general (yes, regression models can be linear and non-linear). This is mandatory knowledge for every data scientist and can help you solve real challenges as well. We'll take a closer look at this in the following subsections...

Unsupervised learning

AWS provides several unsupervised learning algorithms for the following tasks:

Clustering:
K-means algorithm
Dimension reduction:
Principal Component Analysis (PCA)
Pattern recognition:
IP Insights
Anomaly detection:
Random Cut Forest Algorithm (RCF)

Let's start by talking about clustering and how the most popular clustering algorithm works: K-means.

Clustering

Clustering algorithms are very popular in data science. Basically, they aim to identify groups in a given dataset. Technically, we call these findings or groups clusters. Clustering algorithms belong to the field of non-supervised learning, which means that they don't need a label or response variable to be trained.

This is just fantastic because labeled data used to be scarce. However, it comes with some limitations. The main one is that clustering algorithms provide clusters for you, but not the meaning of each cluster. Thus, someone, as a...

Textual analysis

Modern applications use Natural Language Processing (NLP) for several purposes, such as text translation, document classifications, web search, named entity recognition (NER), and many others.

AWS offers a suite of algorithms for most NLP use cases. In the next few subsections, we will have a look at these built-in algorithms for textual analysis.

Blazing Text algorithm

Blazing Text does two different types of tasks: text classification, which is a supervised learning approach that extends the fastText text classifier, and word2vec, which is an unsupervised learning algorithm.

The Blazing Text's implementations of these two algorithms are optimized to run on large datasets. For example, you can train a model on top of billions of words in a few minutes.

This scalability aspect of Blazing Text is possible due to the following:

Its ability to use multi-core CPUs and a single GPU to accelerate text classification
Its ability to use multi...

Image processing

Image processing is a very popular topic in machine learning. The idea is pretty self-explanatory: creating models that can analyze images and make inferences on top of them. By inference, you can understand this as detecting objects in an image, classifying images, and so on.

AWS offers a set of built-in algorithms we can use to train image processing models. In the next few sections, we will have a look at those algorithms.

Image classification algorithm

As the name suggests, the image classification algorithm is used to classify images using supervised learning. In other words, it needs a label within each image. It supports multi-label classification.

The way it operates is simple: during training, it receives an image and its associated labels. During inference, it receives an image and returns all the predicted labels. The image classification algorithm uses a CNN (ResNet) for training. It can either train the model from scratch or take advantage...

Summary

That was such a journey! Let's take a moment to highlight what we have just learned. We broke this chapter into four main sections: supervised learning, unsupervised learning, textual analysis, and image processing. Everything that we have learned fits those subfields of machine learning.

The list of supervised learning algorithms that we have studied includes the following:

Linear learner algorithm
Factorization machines algorithm
XGBoost algorithm
K-Nearest Neighbors algorithm
Object2Vec algorithm
DeepAR forecasting algorithm

Remember that you can use linear learner, factorization machines, XGBoost, and KNN for multiple purposes, including to solve regression and classification problems. Linear learner is probably the simplest algorithm out of these four; factorization machines extend linear learner and are good for sparse datasets, XGBoost uses an ensemble method based on decision trees, and KNN is an index-based algorithm.

The...

Questions

You are working as a lead data scientist for a retail company. Your team is building a regression model and using the linear learner built-in algorithm to predict the optimal price of a particular product. The model is clearly overfitting to the training data and you suspect that this is due to the excessive number of variables being used. Which of the following approaches would best suit a solution that addresses your suspicion?
a) Implementing a cross-validation process to reduce overfitting during the training process.
b) Applying L1 regularization and changing the wd hyperparameter of the linear learner algorithm.
c) Applying L2 regularization and changing the wd hyperparameter of the linear learner algorithm.
d) Applying L1 and L2 regularization.
Answers
C, This question prompts about to the problem of overfitting due an excessive number of features being used. L2 regularization, which is available in linear learner through the wd hyperparameter, will work as a feature...

The rest of the chapter is locked

You have been reading a chapter from

AWS Certified Machine Learning Specialty: MLS-C01 Certification Guide

Published in: Mar 2021Publisher: PacktISBN-13: 9781800569003

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Somanath Nanda

Somanath has 10 years of working experience in IT industry which includes Prod development, Devops, Design and architect products from end to end. He has also worked at AWS as a Big Data Engineer for about 2 years.
Read more about Somanath Nanda

Weslley Moura

Weslley Moura has been developing data products for the past decade. At his recent roles, he has been influencing data strategy and leading data teams into the urban logistics and blockchain industries.
Read more about Weslley Moura

Other recommended products

Related to this chapter

Amazon Redshift Cookbook

The Amazon Redshift Cookbook helps you get to grips with architecting Redshift and performing database administration tasks. You'll learn techniques for building pipelines, loading data optimally, and deriving insights from this data, along with understanding how to optimize performance and costs associated with data warehouses, and build ingestion patterns with Amazon Redshift.

BookJul 2021384 pages

Serverless Architectures with AWS

Serverless Architectures with AWS teaches you how to build serverless applications on AWS—applications that do not require the developer to provision, scale, or manage any servers. Using an event-driven approach and AWS Lambda as the primary service, the book explains the many benefits of serverless architectures. By the end of the book, you will be ready to create and run your first serverless application that takes advantage of the high availability, security, performance, and scalability of AWS. With this new architecture, you will be able to focus on your product instead of worrying about managing and operating servers to run it.

BookDec 2018226 pages

Learn Amazon SageMaker

This book will teach you how to move quickly from business questions to machine learning models in production. Using real-world examples implemented with Python and Jupyter notebooks, you’ll learn about many the features and APIs of Amazon SageMaker on a wide spectrum of use cases: tabular data, computer vision, and natural language processing.

BookAug 2020490 pages

Hands-On Artificial Intelligence on Amazon Web Services

AI in AWS covers primarily two broad topics – a) how to leverage readily available AI/ML APIs and b) how to build, train and deploy ML models from scratch, to solve diverse business problems, such as demand forecasting, image classification, topic modeling, speech and text recognition. By the end of the book, you will have learned how to build production grade AI/ML applications in AWS

BookOct 2019426 pages1

Amazon SageMaker Best Practices

Going beyond the basics, Amazon SageMaker Best Practices provides end-to-end coverage of the service capabilities that the platform offers for building and automating machine learning workloads to address data science challenges. With this book, you'll discover tips to train, deploy, and monitor your machine learning solutions efficiently.

BookSep 2021348 pages

Mastering Machine Learning on AWS

This book will help you master your skills in various artificial intelligence and machine learning services available on AWS. Through practical hands-on examples, you’ll learn how to use these services to generate impressive results. You will have a tremendous understanding of how to use a wide range of AWS services in your own organization.

BookMay 2019306 pages

AWS Certified Developer - Associate Guide

With rapid adaptation of the cloud platform, the need for cloud certification has also increased. This is your one stop solution and will help you transform yourself from zero to certified. This guide will help you gain technical expertise in the AWS platform and help you start working with various AWS Services.

BookJun 2019812 pages5

AWS Certified Security – Specialty Exam Guide

Amazon has come up with Specialty certifications which validates a particular user's expertise that he/she would want to build a career in. This Guide will be a companion to getting skilled with complex and creative security solutions.

BookSep 2020558 pages

The Applied AI and Natural Language Processing Workshop

The Applied AI and NLP Workshop will show you how to integrate artificial intelligence with Amazon Web Services to create intelligent applications. From developing language translation apps and chatbots to creating models for processing large volumes of images, you’ll learn key concepts effectively and in a real-world context.

BookJul 2020384 pages

AWS Certified Developer - Associate Guide

With rapid adaptation of the cloud platform, the need for cloud certification has also increased. This is your one stop solution and will help you transform yourself from zero to certified. This guide will help you gain technical expertise in the AWS platform and help you start working with various AWS Services.

BookSep 2017600 pages

AWS Certified Solutions Architect - Associate Guide

With rapid adaptation of the cloud platform, the need for cloud certification has also increased. This is your one stop solution and will help you transform yourself from zero to certified. This guide will help you gain technical expertise in the AWS platform and help you start working with various AWS Services

BookOct 2018626 pages

Scalable Data Streaming with Amazon Kinesis

This practical guide takes a hands-on approach to implementation and associated methodologies to have you up and running with all that Amazon Kinesis has to offer. You’ll work with use cases and practical examples to be able to ingest, process, analyze, and stream real-time data in no time.

BookMar 2021314 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages