You're reading from AWS Certified Machine Learning - Specialty (MLS-C01) Certification Guide - Second Edition

Product typeBook

Published inFeb 2024

PublisherPackt

ISBN-139781835082201

Edition2nd Edition

Concepts

Machine Learning

Authors (2):

Somanath Nanda

Weslley Moura

View More author details

Applying Machine Learning Algorithms

In the previous chapter, you learned about understanding data and visualization. It is now time to move on to the modeling phase and study machine learning algorithms! In the earlier chapters, you learned that building machine learning models requires a lot of knowledge about AWS services, data engineering, data exploration, data architecture, and much more. This time, you will delve deeper into the algorithms that have been introduced and more.

Having a good sense of the different types of algorithms and machine learning approaches will put you in a very good position to make decisions during your projects. Of course, this type of knowledge is also crucial to the AWS Certified Machine Learning Specialty exam.

Bear in mind that there are thousands of algorithms out there. You can even propose your own algorithm for a particular problem. In this chapter, you will learn about the most relevant ones and, hopefully, the ones that you will probably...

Introducing this chapter

During this chapter, you will read about several algorithms, modeling concepts, and learning strategies. All these topics are beneficial for you to know for the exam and throughout your career as a data scientist.

This chapter has been structured in such a way that it not only covers the necessary topics of the exam but also gives you a good sense of the most important learning strategies out there. For example, the exam will check your knowledge regarding the basic concepts of K-Means. However, this chapter will cover it on a much deeper level, since this is an important topic for your career as a data scientist.

The chapter will follow this approach of looking deeper into the algorithms’ logic for some types of models that every data scientist should master. Furthermore, keep this in mind: sometimes you may go deeper than what is expected of you in the exam, but that will be extremely important for you in your career.

Many times during this...

Storing the training data

First of all, you can use multiple AWS services to prepare data for machine learning, such as Elastic MapReduce (EMR), Redshift, Glue, and so on. After preprocessing the training data, you should store it in S3, in a format expected by the algorithm you are using. Table 6.1 shows the list of acceptable data formats per algorithm.

...

A word about ensemble models

Before you start diving into the algorithms, there is an important modeling concept that you should be aware of – ensemble. The term ensemble is used to describe methods that use multiple algorithms to create a model.

A regular algorithm that does not implement ensemble methods will rely on a single model to train and predict the target variable. That is what happens when you create a decision tree or regression model. On the other hand, algorithms that do implement ensemble methods will rely on multiple models to predict the target variable. In that case, since each of these models might come up with a different prediction for the target variable, ensemble algorithms implement either a voting (for classification models) or averaging (for regression models) system to output the final results. Table 6.2 illustrates a very simple voting system for an ensemble algorithm composed of three models.

Data format	Algorithm
`Application/x-image`	Object detection algorithm, semantic segmentation
`Application/x-recordio`	Object detection algorithm
`Application/x-recordio-protobuf`	Factorization machines, K-Means, KNN, latent Dirichlet allocation, linear learner, NTM, PCA, RCF, sequence-to-sequence
`Application/jsonlines`	BlazingText, DeepAR

...

Supervised learning

AWS provides supervised learning algorithms for general purposes (regression and classification tasks) and more specific purposes (forecasting and vectorization). The list of built-in algorithms that can be found in these sub-categories is as follows:

Linear learner algorithm
Factorization machines algorithm
XGBoost algorithm
KNN algorithm
Object2Vec algorithm
DeepAR forecasting algorithm

You will start by learning about regression models and the linear learner algorithm.

Working with regression models

Looking at linear regression models is a nice way to understand what is going on inside regression models in general (linear and non-linear regression models). This is mandatory knowledge for every data scientist and can help you solve real challenges as well. You will now take a closer look at this in the following subsections.

Introducing regression algorithms

Linear regression models aim to predict a numeric value...

Unsupervised learning

AWS provides several unsupervised learning algorithms for the following tasks:

Clustering: K-Means algorithm
Dimension reduction: Principal Component Analysis (PCA)
Pattern recognition: IP Insights
Anomaly detection: The Random Cut Forest (RCF) algorithm

Let us start by talking about clustering and how the most popular clustering algorithm works: K-Means.

Clustering

Clustering algorithms are very popular in data science. Basically, they aim to identify similar groups in a given dataset, also known as clusters. Clustering algorithms belong to the field of non-supervised learning, which means that they do not need a label or response variable to be trained.

This is just fantastic since labeled data is very scarce! However, it comes with some limitations. The main one is that clustering algorithms provide clusters for you, but not the meaning of each cluster. Thus, someone, as a subject matter expert, has to analyze the properties...

Textual analysis

Modern applications use Natural Language Processing (NLP) for several purposes, such as text translation, document classifications, web search, Named Entity Recognition (NER), and many others.

AWS offers a suite of algorithms for most NLP use cases. In the next few subsections, you will have a look at these built-in algorithms for textual analysis.

BlazingText algorithm

BlazingText does two different types of tasks: text classification, which is a supervised learning approach that extends the fastText text classifier, and Word2Vec, which is an unsupervised learning algorithm.

BlazingText’s implementations of these two algorithms are optimized to run on large datasets. For example, you can train a model on top of billions of words in a few minutes.

This scalability aspect of BlazingText is possible due to the following:

Its ability to use multi-core CPUs and a single GPU to accelerate text classification
Its ability to use multi-core...

Image processing

Image processing is a very popular topic in machine learning. The idea is pretty self-explanatory: creating models that can analyze images and make inferences on top of them. By inference, you can understand this as detecting objects in an image, classifying images, and so on.

AWS offers a set of built-in algorithms you can use to train image processing models. In the next few sections, you will have a look at those algorithms.

Image classification algorithm

As the name suggests, the image classification algorithm is used to classify images using supervised learning. In other words, it needs a label within each image. It supports multi-label classification.

The way it operates is simple: during training, it receives an image and its associated labels. During inference, it receives an image and returns all the predicted labels. The image classification algorithm uses a CNN (ResNet) for training. It can either train the model from scratch or take advantage...

Summary

That was such a journey! Take a moment to recap what you have just learned. This chapter had four main topics: supervised learning, unsupervised learning, textual analysis, and image processing. Everything that you have learned fits into those subfields of machine learning.

The list of supervised learning algorithms that you have studied includes the following:

Linear learner
Factorization machines
XGBoost
KNN
Object2Vec
DeepAR forecasting

Remember that you can use linear learner, factorization machines, XGBoost, and KNN for multiple purposes, including solving regression and classification problems. Linear learner is probably the simplest algorithm out of these four; factorization machines extends linear earner and is good for sparse datasets, XGBoost uses an ensemble method based on decision trees, and KNN is an index-based algorithm.

The other two algorithms, Object2Vec and DeepAR, are used for specific purposes. Object2Vec is used...

Exam Readiness Drill – Chapter Review Questions

Apart from a solid understanding of key concepts, being able to think quickly under time pressure is a skill that will help you ace your certification exam. That is why working on these skills early on in your learning journey is key.

Chapter review questions are designed to improve your test-taking skills progressively with each chapter you learn and review your understanding of key concepts in the chapter at the same time. You’ll find these at the end of each chapter.

How To Access These Resources

To learn how to access these resources, head over to the chapter titled Chapter 11, Accessing the Online Practice Resources.

To open the Chapter Review Questions for this chapter, perform the following steps:

Click the link – https://packt.link/MLSC01E2_CH06.
Alternatively, you can scan the following QR code (Figure 6.19):

Figure 6.19 – QR code that opens Chapter...

Working On Timing

Target: Your aim is to keep the score the same while trying to answer these questions as quickly as possible. Here’s an example of how your next attempts should look like:

Attempt	Score	Time Taken
Attempt 5	77%	21 mins 30 seconds
Attempt 6	78%	18 mins 34 seconds
Attempt 7	76%	14 mins 44 seconds

Table 6.11 – Sample timing practice drills on the online platform

Note

The time limits shown in the above table are just examples. Set your own time limits with each attempt based on the time limit of the quiz on the website.

With each new attempt, your score should stay above 75% while your “time taken...

The rest of the chapter is locked

You have been reading a chapter from

AWS Certified Machine Learning - Specialty (MLS-C01) Certification Guide - Second Edition

Published in: Feb 2024Publisher: PacktISBN-13: 9781835082201

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Somanath Nanda

Somanath has 10 years of working experience in IT industry which includes Prod development, Devops, Design and architect products from end to end. He has also worked at AWS as a Big Data Engineer for about 2 years.
Read more about Somanath Nanda

Weslley Moura

Weslley Moura has been developing data products for the past decade. At his recent roles, he has been influencing data strategy and leading data teams into the urban logistics and blockchain industries.
Read more about Weslley Moura

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages