You're reading from AWS Certified Machine Learning Specialty: MLS-C01 Certification Guide

Product typeBook

Published inMar 2021

Reading LevelBeginner

PublisherPackt

ISBN-139781800569003

Edition1st Edition

Languages

Python

Tools

Azure Functions

Concepts

Machine Learning

Authors (2):

Somanath Nanda

Weslley Moura

View More author details

Chapter 8: Evaluating and Optimizing Models

It is now time to learn how to evaluate and optimize machine learning models. During the process of modeling, or even after model completion, you might want to understand how your model is performing. Each type of model has its own set of metrics that can be used to evaluate performance, and that is what we are going to study in this chapter.

Apart from model evaluation, as a data scientist, you might also need to improve your model's performance by tuning the hyperparameters of your algorithm. We will take a look at some nuances of this modeling task.

In this chapter, we will cover the following topics:

Introducing model evaluation
Evaluating classification models
Evaluating regression models
Model optimization

Alright, let's do it!

Introducing model evaluation

There are several different scenarios in which we might want to evaluate model performance, some of them are as follows.

You are creating a model and testing different approaches and/or algorithms. Therefore, you need to compare these models to select the best one.
You have just completed your model and you need to document your work, which includes specifying the model's performance metrics that you have reached out to during the modeling phase.
Your model is running in a production environment and you need to track its performance. If you encounter model drift, then you might want to retrain the model.
Important note
The term model drift is used to refer to the problem of model deterioration. When you are building a machine learning model, you must use data to train the algorithm. This set of data is known as training data, and it reflects the business rules at a particular point in time. If these business rules change over time, your...

Evaluating classification models

Classification models are one of the most traditional classes of problems that you might face, either during the exam or during your journey as a data scientist. A very important artifact that you might want to generate during the classification model evaluation is known as a confusion matrix.

A confusion matrix compares your model predictions against the real values of each class under evaluation. Figure 8.1 shows what a confusion matrix looks like in a binary classification problem:

Figure 8.1 – A confusion matrix

We find the following components in a confusion matrix:

TP: This is the number of True Positive cases. Here, we have to count the number of cases that have been predicted as true and are, indeed, true. For example, in a fraud detection system, this would be the number of fraudulent transactions that were correctly predicted as fraud.
TN: This is the number of True Negative cases. Here, we have...

Evaluating regression models

Regression models are quite different from classification models since the outcome of the model is a continuous number. Therefore, the metrics around regression models aim to monitor the difference between real and predicted values.

The simplest way to check the difference between a predicted value (yhat) and its actual value (y) is by performing a simple subtraction operation, where the error will be equal to the absolute value of yhat – y. This metric is known as the Mean Absolute Error (MAE).

Since we usually have to evaluate the error of each prediction, i, we have to take the mean value of the errors. The following formula shows how this error can be formally defined:

Sometimes, you might want to penalize bigger errors over smaller errors. To achieve this, you can use another metric, which is known as the Mean Squared Error (MSE). MSE will square each error and return the mean value.

By squaring errors,...

Model optimization

As you know, understanding evaluation metrics is very important in order to measure your model's performance and document your work. In the same way, when we want to optimize our current models, evaluating metrics also plays a very important role in defining the baseline performance that we want to challenge.

The process of model optimization consists of finding the best configuration (also known as hyperparameters) of the machine learning algorithm for a particular data distribution. We don't want to find hyperparameters that overfit the training data in the same way that we don't want to find hyperparameters that underfit the training data.

You learned about overfitting and underfitting in Chapter 1, Machine Learning Fundamentals. In the same chapter, you also learned how to avoid these two types of modeling issues.

In this section, we will learn about some techniques that you can use to find the best configuration for a particular algorithm...

Summary

In this chapter, you learned about the main metrics for model evaluation. We first started with the metrics for classification problems and then we moved on to the metrics for regression problems.

In terms of classification metrics, you have been introduced to the well-known confusion matrix, which is probably the most important artifact to perform a model evaluation on classification models.

Aside from knowing what true positive, true negative, false positive, and false negative are, we have learned how to combine these components to extract other metrics, such as accuracy, precision, recall, the F1 score, and AUC.

We went even deeper and learned about ROC curves, as well as precision-recall curves. We learned that we can use ROC curves to evaluate fairly balanced datasets and precision-recall curves for moderate to imbalanced datasets.

By the way, when you are dealing with imbalanced datasets, remember that using accuracy might not be a good idea.

In terms...

Questions

You are working as a data scientist for a pharmaceutical company. You are collaborating with other teammates to create a machine learning model to classify certain types of diseases on image exams. The company wants to prioritize the assertiveness rate of positive cases, even if they have to wrongly return false negatives. Which type of metric would you use to optimize the underlying model?
a. Recall
b. Precision
c. R-squared
d. RMSE
Answer
In this scenario, the company prefers to have a higher probability to be right on positive outcomes at the cost of wrongly classifying some positive cases as negative. Technically, they prefer to increase precision at the cost of reducing recall.
You are working as a data scientist for a pharmaceutical company. You are collaborating with other teammates to create a machine learning model to classify certain types of diseases on image exams. The company wants to prioritize the capture of positive cases, even if they have to wrongly return...

The rest of the chapter is locked

You have been reading a chapter from

AWS Certified Machine Learning Specialty: MLS-C01 Certification Guide

Published in: Mar 2021Publisher: PacktISBN-13: 9781800569003

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Somanath Nanda

Somanath has 10 years of working experience in IT industry which includes Prod development, Devops, Design and architect products from end to end. He has also worked at AWS as a Big Data Engineer for about 2 years.
Read more about Somanath Nanda

Weslley Moura

Weslley Moura has been developing data products for the past decade. At his recent roles, he has been influencing data strategy and leading data teams into the urban logistics and blockchain industries.
Read more about Weslley Moura

Other recommended products

Related to this chapter

Amazon Redshift Cookbook

The Amazon Redshift Cookbook helps you get to grips with architecting Redshift and performing database administration tasks. You'll learn techniques for building pipelines, loading data optimally, and deriving insights from this data, along with understanding how to optimize performance and costs associated with data warehouses, and build ingestion patterns with Amazon Redshift.

BookJul 2021384 pages

Serverless Architectures with AWS

Serverless Architectures with AWS teaches you how to build serverless applications on AWS—applications that do not require the developer to provision, scale, or manage any servers. Using an event-driven approach and AWS Lambda as the primary service, the book explains the many benefits of serverless architectures. By the end of the book, you will be ready to create and run your first serverless application that takes advantage of the high availability, security, performance, and scalability of AWS. With this new architecture, you will be able to focus on your product instead of worrying about managing and operating servers to run it.

BookDec 2018226 pages

Learn Amazon SageMaker

This book will teach you how to move quickly from business questions to machine learning models in production. Using real-world examples implemented with Python and Jupyter notebooks, you’ll learn about many the features and APIs of Amazon SageMaker on a wide spectrum of use cases: tabular data, computer vision, and natural language processing.

BookAug 2020490 pages

Hands-On Artificial Intelligence on Amazon Web Services

AI in AWS covers primarily two broad topics – a) how to leverage readily available AI/ML APIs and b) how to build, train and deploy ML models from scratch, to solve diverse business problems, such as demand forecasting, image classification, topic modeling, speech and text recognition. By the end of the book, you will have learned how to build production grade AI/ML applications in AWS

BookOct 2019426 pages1

Amazon SageMaker Best Practices

Going beyond the basics, Amazon SageMaker Best Practices provides end-to-end coverage of the service capabilities that the platform offers for building and automating machine learning workloads to address data science challenges. With this book, you'll discover tips to train, deploy, and monitor your machine learning solutions efficiently.

BookSep 2021348 pages

Mastering Machine Learning on AWS

This book will help you master your skills in various artificial intelligence and machine learning services available on AWS. Through practical hands-on examples, you’ll learn how to use these services to generate impressive results. You will have a tremendous understanding of how to use a wide range of AWS services in your own organization.

BookMay 2019306 pages

AWS Certified Developer - Associate Guide

With rapid adaptation of the cloud platform, the need for cloud certification has also increased. This is your one stop solution and will help you transform yourself from zero to certified. This guide will help you gain technical expertise in the AWS platform and help you start working with various AWS Services.

BookJun 2019812 pages5

AWS Certified Security – Specialty Exam Guide

Amazon has come up with Specialty certifications which validates a particular user's expertise that he/she would want to build a career in. This Guide will be a companion to getting skilled with complex and creative security solutions.

BookSep 2020558 pages

The Applied AI and Natural Language Processing Workshop

The Applied AI and NLP Workshop will show you how to integrate artificial intelligence with Amazon Web Services to create intelligent applications. From developing language translation apps and chatbots to creating models for processing large volumes of images, you’ll learn key concepts effectively and in a real-world context.

BookJul 2020384 pages

AWS Certified Developer - Associate Guide

With rapid adaptation of the cloud platform, the need for cloud certification has also increased. This is your one stop solution and will help you transform yourself from zero to certified. This guide will help you gain technical expertise in the AWS platform and help you start working with various AWS Services.

BookSep 2017600 pages

AWS Certified Solutions Architect - Associate Guide

With rapid adaptation of the cloud platform, the need for cloud certification has also increased. This is your one stop solution and will help you transform yourself from zero to certified. This guide will help you gain technical expertise in the AWS platform and help you start working with various AWS Services

BookOct 2018626 pages

Scalable Data Streaming with Amazon Kinesis

This practical guide takes a hands-on approach to implementation and associated methodologies to have you up and running with all that Amazon Kinesis has to offer. You’ll work with use cases and practical examples to be able to ingest, process, analyze, and stream real-time data in no time.

BookMar 2021314 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages