You're reading from Machine Learning for Imbalanced Data

Product typeBook

Published inNov 2023

Reading LevelBeginner

PublisherPackt

ISBN-139781801070836

Edition1st Edition

Languages

Rust

Tools

TensorFlow Lite

Concepts

Data Science

Authors (2):

Kumar Abhishek

Dr. Mounir Abdelaziz

View More author details

Model Calibration

So far, we have explored various ways to handle the data imbalance. In this chapter, we will see the need to do some post-processing of the prediction scores that we get from the trained models. This can be helpful either during the real-time prediction from the model or during the offline training time evaluation of the model. We will also understand some ways of measuring how calibrated the model is and how imbalanced datasets make the model calibration inevitable.

The following topics will be covered in the chapter:

Introduction to model calibration
The influence of data balancing techniques on model calibration
Plotting calibration curves for a model trained on a real-world dataset
Model calibration techniques
The impact of calibration on a model’s performance

By the end of this chapter, you will have a clear understanding of what model calibration means, how to measure it, and when and how to apply it.

Technical requirements

Similar to prior chapters, we will continue to utilize common libraries such as matplotlib, numpy, scikit-learn, xgboost, and imbalanced-learn. The code and notebooks for this chapter are available on GitHub at https://github.com/PacktPublishing/Machine-Learning-for-Imbalanced-Data/tree/master/chapter10. You can open the GitHub notebook using Google Colab by clicking on the Open in Colab icon on the top of the chapter’s notebook or by launching it from https://colab.research.google.com using the GitHub URL of the notebook.

Introduction to model calibration

What is the difference between stating “The model predicted the transaction as fraudulent” and “The model estimated a 60% probability of the transaction being fraudulent”? When would one statement be more useful than the other?

The difference between the two is that the second statement represents likelihood. This likelihood can be useful in understanding the model’s confidence, which is needed in many applications, such as in medical diagnosis. For example, the prediction that a patient is 80% likely or 80% probable to have cancer is more useful to the doctor than just predicting whether the patient has cancer or not.

A model is considered calibrated if there is a match between the number of positive classes and predicted probability. Let’s try to understand this further. Let’s say we have 10 observations, and for each of them, the model predicts a probability of 0.7 to be of the positive class...

The influence of data balancing techniques on model calibration

The usual impact of applying data-level techniques, such as oversampling and undersampling, is that they change the distribution of the training data for the model. This means that the model sees an almost equal number of all the classes, which doesn’t reflect the actual data distribution. Because of this, the model becomes less calibrated against the true imbalanced distribution of data. Similarly, algorithm-level cost-sensitive techniques that use class_weight to account for the data imbalance have a similar degraded impact on degrading the calibration of the model against the true data distribution. Figure 10.7 (log scale) from a recent study [7] shows the degrading calibration of a CNN-based model for pneumonia detection task, as class_weight increases from 0.5 to 0.9 to 0.99. The model becomes over-confident and hence less calibrated with the increase in class_weight.

Figure 10.7...

Plotting calibration curves for a model trained on a real-world dataset

Model calibration should ideally be done on a dataset that is separate from the training and test set. Why? It’s to avoid overfitting because the model can become too tailored to the training/test set’s unique characteristics.

We can have a hold-out dataset that has been specifically set aside for model calibration. In some cases, we may have too little data to justify splitting it further into a separate hold-out dataset for calibration. In such cases, a practical compromise might be to use the test set for calibration, assuming that the test set has the same distribution as the dataset on which the model will be used to make final predictions. However, we should keep in mind that after calibrating on the test set, we no longer have an unbiased estimate of the final performance of the model, and we need to be cautious about interpreting the model’s performance metrics.

We use the HR...

Model calibration techniques

There are several ways to calibrate a model. There are two broad categorizations of the calibration techniques based on the nature of the method used to adjust the predicted probabilities to better align with the true probabilities: parametric and non-parametric:

Parametric methods: These methods assume a specific functional form for the relationship between the predicted probabilities and the true probabilities. They have a set number of parameters that need to be estimated from the data. Once these parameters are estimated, the calibration function is fully specified. Examples include Platt scaling, which assumes a logistic function, and beta calibration, which assumes a beta distribution. We will also discuss temperature scaling and label smoothing.
Non-parametric methods: These methods do not assume a specific functional form for the calibration function. They are more flexible and can adapt to more complex relationships between the predicted...

The impact of calibration on a model’s performance

Accuracy, log-loss, and Brier scores usually improve because of calibration. However, since the model calibration still involves approximately fitting a model to the calibration curve plotted on the held-out calibration dataset, it may sometimes worsen the accuracy or other performance metrics by small amounts. Nevertheless, the benefits of having calibrated probabilities in terms of giving us actual interpretable probability values that represent likelihood far outweigh the slight performance impact.

As discussed in Chapter 1, Introduction to Data Imbalance in Machine Learning, ROC-AUC is a rank-based metric, meaning it evaluates the model’s ability to distinguish between classes based on the ranking of predicted scores rather than their absolute values. ROC-AUC doesn’t make any claim about accurate probability estimates. Strictly monotonic calibration functions, which continuously increase or decrease without...

Summary

In this chapter, we went through the basic concepts of model calibration, why we should care about it, how to measure whether a model is calibrated, how data imbalance affects the model calibration, and, finally, how to calibrate an uncalibrated model. Some of the calibration techniques we talked about include Platt’s scaling, isotonic regression, temperature scaling, and label smoothing.

With this, we come to the end of this book. Thank you for dedicating your time to reading the book. We trust that it has broadened your knowledge of handling imbalanced datasets and their practical applications in machine learning. As we draw this book to a close, we’d like to offer some concluding advice on how to effectively utilize the techniques discussed.

Like other machine learning techniques, the methods discussed in this book can be highly useful under the right conditions, but they also come with their own set of challenges. Recognizing when and where to apply...

Questions

Can a well-calibrated model have low accuracy? What about the reverse: can a model with high accuracy be poorly calibrated?
Take a limited classification dataset with, say, only 100 data points. Train a decision tree model using this dataset and then assess its calibration.
1. Calibrate the model using Platt’s scaling. Measure the Brier score after calibration.
2. Calibrate the model using isotonic regression. Measure the Brier score after calibration
3. How do the Brier scores differ in (A) and (B)?
4. Measure the AUC, accuracy, precision, recall, and F1 score of the model before and after calibration.
Take a balanced dataset, say with 10,000 points. Train a decision tree model using it. Then check how calibrated it is.
1. Calibrate the model using Platt’s scaling. Measure the Brier score after calibration.
2. Calibrate the model using isotonic regression. Measure the Brier score after calibration.
3. How do the Brier scores differ in (a) and (b)?
4. Measure the AUC, accuracy...

References

C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On Calibration of Modern Neural Networks.” arXiv, Aug. 03, 2017. Accessed: Nov. 21, 2022, http://arxiv.org/abs/1706.04599
A. Niculescu-Mizil and R. Caruana, “Predicting good probabilities with supervised learning,” in Proceedings of the 22nd International Conference on Machine Learning - ICML ‘05, Bonn, Germany, 2005, pp. 625–632. doi: 10.1145/1102351.1102430.
J. Mukhoti, V. Kulharia, A. Sanyal, S. Golodetz, P. H. S. Torr, and P. K. Dokania, “Calibrating Deep Neural Networks using Focal Loss”. Feb 2020, https://doi.org/10.48550/arXiv.2002.09437
B. C. Wallace and I. J. Dahabreh, “Class Probability Estimates are Unreliable for Imbalanced Data (and How to Fix Them),” in 2012 IEEE 12th International Conference on Data Mining, Brussels, Belgium, Dec. 2012, pp. 695–704. doi: 10.1109/ICDM.2012.115.
M. Pakdaman Naeini, G. Cooper, and...

The rest of the chapter is locked

You have been reading a chapter from

Machine Learning for Imbalanced Data

Published in: Nov 2023Publisher: PacktISBN-13: 9781801070836

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Kumar Abhishek

Kumar Abhishek is a seasoned Senior Machine Learning Engineer at Expedia Group, US, specializing in risk analysis and fraud detection for Expedia brands. With over a decade of experience at companies such as Microsoft, Amazon, and a Bay Area startup, Kumar holds an MS in Computer Science from the University of Florida.
Read more about Kumar Abhishek

Dr. Mounir Abdelaziz

Dr. Mounir Abdelaziz is a deep learning researcher specializing in computer vision applications. He holds a Ph.D. in computer science and technology from Central South University, China. During his Ph.D. journey, he developed innovative algorithms to address practical computer vision challenges. He has also authored numerous research articles in the field of few-shot learning for image classification.
Read more about Dr. Mounir Abdelaziz

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages