You're reading from The Deep Learning Architect's Handbook

Product typeBook

Published inDec 2023

PublisherPackt

ISBN-139781803243795

Edition1st Edition

Concepts

Deep Learning

Author (1)

Ee Kin Chin

Exploring Model Evaluation Methods

A trained deep learning model without any form of validation cannot be deployed to production. Production, in the context of the machine learning software domain, refers to the deployment and operation of a machine learning model in a live environment for actual consumption of its predictions. More broadly, model evaluation serves as a critical component in any deep learning project. Typically, a deep learning project will result in many models being built, and a final model will be chosen to serve in a production environment. A good model evaluation process for any project leads to the following:

A better-performing final model through model comparisons and metrics
Fewer production prediction mishaps by understanding common model pitfalls
More closely aligned practitioner and final model behaviors through model insights
A higher probability of project success through success metric evaluation
A final model that is less biased...

Technical requirements

For this chapter, we will have a practical implementation using the Python programming language. To complete it, you will only need to install the matplotlib library in Python.

The code files are available on GitHub at https://github.com/PacktPublishing/The-Deep-Learning-Architect-Handbook/tree/main/CHAPTER_10.

Exploring the different model evaluation methods

Most practitioners are familiar with accuracy-related metrics. This is the most basic evaluation method. Typically, for supervised problems, a practitioner will treat an accuracy-related metric as the golden source of truth. In the context of model evaluation, the term “accuracy metrics” is often used to collectively refer to various performance metrics such as accuracy, F1 score, recall, precision, and mean squared error. When coupled with a suitable cross-validation partitioning strategy, using metrics as a standalone evaluation strategy can go a long way in most projects. In deep learning, accuracy-related metrics are typically used to monitor the progress of the model at each epoch. The monitoring process can subsequently be extended to perform early stopping to stop training the model when it doesn’t improve anymore and to determine when to reduce the learning rate. Additionally, the best model weights can be...

Engineering the base model evaluation metric

Engineering a metric for your use case is a skill that is often overlooked. This is most likely because most projects work on a publicly available dataset, which almost always already has a metric proposed. This includes projects on Kaggle and many public datasets people use to benchmark against. However, this does not happen in real life and a metric doesn’t just get served to you. Let’s explore this topic further here and gain this skillset.

The model evaluation metric is the first evaluation method that is essential in supervised projects, excluding unsupervised-based projects. There are a few baseline metrics that exist to be the de facto metrics depending on the problem and target type. Additionally, there are also more customized versions of these baseline metrics that are catered to special objectives. For example, generative-based tasks can be evaluated through a special human-based opinion score called the mean...

Exploring custom metrics and their applications

Base metrics are generally sufficient to meet the requirements of most use cases. However, custom metrics build upon base metrics and incorporate additional goals that are specific to a given scenario. It’s helpful to think of base metrics as a bachelor’s degree and custom metrics as a master’s or PhD degree. It’s perfectly fine to use only base metrics if they meet your needs and you don’t have any additional requirements.

Custom ideals often arise naturally early on in a project and are highly dependent on the specific use case. Most real use cases don’t expose their chosen metrics to the public, even when the prediction of the model is meant to be utilized publicly, such as Open AI’s ChatGPT. However, in machine learning competitions, companies with real use cases accompanied by data publish their chosen metric publicly to find the best model that can be built. In such a setting for...

Exploring statistical tests for comparing model metrics

In machine learning, metric-based model evaluation often involves using averages of aggregated metrics from different folds or partitions, such as holdout and validation sets, to compare the performance of various models. However, relying solely on these average performance metrics may not provide a comprehensive assessment of a model’s performance and generalizability. A more robust approach to model evaluation is the incorporation of statistical hypothesis tests, which assess whether observed differences in performance are statistically significant or due to random chance.

Statistical hypothesis tests are procedures used to determine whether observed data provides sufficient evidence to reject a null hypothesis in favor of an alternative hypothesis, helping to quantify the likelihood that the observed differences are due to random chance or a genuine effect. In statistical tests, the null hypothesis (H0) is a default...

Relating the evaluation metric to success

Defining success in a machine learning project is crucial and should be done at the early stages of the project as introduced in the Defining success section in Chapter 1, Deep Learning Life Cycle. Success can be defined as achieving higher-level objectives, such as improving the efficiency of processes or increasing the accuracy of processes in comparison to manual labor. In some rare cases, machine learning can enable processes that were previously impossible due to human limitations. The ultimate success of achieving these objectives is to save costs or earn more revenue for an organization.

A model with a metric performance score of 0.80 F1 score or 0.00123 RMSE doesn’t really mean anything at face value and has to be translated to something tangible in the use case. For instance, one should answer questions such as what estimated model score can allow the project to achieve the targeted cost savings or revenue improvements. Quantifying...

Directly optimizing the metric

The loss and the metric used to train a deep learning model are two separate components. One of the tricks you can use to improve a model’s accuracy performance against the chosen metric is to directly optimize against it instead of just monitoring performance for the purpose of choosing the best performing model weights and using early stopping. In other words, using the metric as a loss directly!

By directly optimizing for the metric of interest, the model has a chance to improve in a way that is relevant to the end goal rather than optimizing for a proxy loss function that may not be directly related to the ultimate performance of the model. This simply means that the model can result in a much better performance when using the metric as a loss directly.

However, not all metrics can be used as a loss, as not all metrics can be differentiable. Remember that backpropagation requires all functions used to be differentiable so that gradients...

Summary

In this chapter, we briefly explored an overview of different model evaluation methods and how they can be used to measure the performance of a deep learning model. We started with the topic of metric engineering among all the introduced methods. We introduced common base model evaluation metrics. On top of this, we discussed the limitations of using base model evaluation metrics and introduced the concept of engineering a model evaluation metric tailored to the specific problem at hand. We also explored the idea of optimizing directly against the evaluation metric by using it as a loss function. While this approach can be beneficial, it is important to consider the potential pitfalls and limitations, as well as the specific use case for which this approach may be appropriate.

The evaluation of deep learning models requires careful consideration of appropriate evaluation methods, metrics, and statistical tests. Hopefully, after reading through this chapter, I have helped...

The rest of the chapter is locked

You have been reading a chapter from

The Deep Learning Architect's Handbook

Published in: Dec 2023Publisher: PacktISBN-13: 9781803243795

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Ee Kin Chin

Ee Kin Chin is a Senior Deep Learning Engineer at DataRobot. He holds a Bachelor of Engineering (Honours) in Electronics with a major in Telecommunications. Ee Kin is an expert in the field of Deep Learning, Data Science, Machine Learning, Artificial Intelligence, Supervised Learning, Unsupervised Learning, Python, Keras, Pytorch, and related technologies. He has a proven track record of delivering successful projects in these areas and is dedicated to staying up to date with the latest advancements in the field.
Read more about Ee Kin Chin

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages