Reader small image

You're reading from  Machine Learning for Imbalanced Data

Product typeBook
Published inNov 2023
Reading LevelBeginner
PublisherPackt
ISBN-139781801070836
Edition1st Edition
Languages
Concepts
Right arrow
Authors (2):
Kumar Abhishek
Kumar Abhishek
author image
Kumar Abhishek

Kumar Abhishek is a seasoned Senior Machine Learning Engineer at Expedia Group, US, specializing in risk analysis and fraud detection for Expedia brands. With over a decade of experience at companies such as Microsoft, Amazon, and a Bay Area startup, Kumar holds an MS in Computer Science from the University of Florida.
Read more about Kumar Abhishek

Dr. Mounir Abdelaziz
Dr. Mounir Abdelaziz
author image
Dr. Mounir Abdelaziz

Dr. Mounir Abdelaziz is a deep learning researcher specializing in computer vision applications. He holds a Ph.D. in computer science and technology from Central South University, China. During his Ph.D. journey, he developed innovative algorithms to address practical computer vision challenges. He has also authored numerous research articles in the field of few-shot learning for image classification.
Read more about Dr. Mounir Abdelaziz

View More author details
Right arrow

Model Calibration

So far, we have explored various ways to handle the data imbalance. In this chapter, we will see the need to do some post-processing of the prediction scores that we get from the trained models. This can be helpful either during the real-time prediction from the model or during the offline training time evaluation of the model. We will also understand some ways of measuring how calibrated the model is and how imbalanced datasets make the model calibration inevitable.

The following topics will be covered in the chapter:

  • Introduction to model calibration
  • The influence of data balancing techniques on model calibration
  • Plotting calibration curves for a model trained on a real-world dataset
  • Model calibration techniques
  • The impact of calibration on a model’s performance

By the end of this chapter, you will have a clear understanding of what model calibration means, how to measure it, and when and how to apply it.

Technical requirements

Similar to prior chapters, we will continue to utilize common libraries such as matplotlib, numpy, scikit-learn, xgboost, and imbalanced-learn. The code and notebooks for this chapter are available on GitHub at https://github.com/PacktPublishing/Machine-Learning-for-Imbalanced-Data/tree/master/chapter10. You can open the GitHub notebook using Google Colab by clicking on the Open in Colab icon on the top of the chapter’s notebook or by launching it from https://colab.research.google.com using the GitHub URL of the notebook.

Introduction to model calibration

What is the difference between stating “The model predicted the transaction as fraudulent” and “The model estimated a 60% probability of the transaction being fraudulent”? When would one statement be more useful than the other?

The difference between the two is that the second statement represents likelihood. This likelihood can be useful in understanding the model’s confidence, which is needed in many applications, such as in medical diagnosis. For example, the prediction that a patient is 80% likely or 80% probable to have cancer is more useful to the doctor than just predicting whether the patient has cancer or not.

A model is considered calibrated if there is a match between the number of positive classes and predicted probability. Let’s try to understand this further. Let’s say we have 10 observations, and for each of them, the model predicts a probability of 0.7 to be of the positive class...

The influence of data balancing techniques on model calibration

The usual impact of applying data-level techniques, such as oversampling and undersampling, is that they change the distribution of the training data for the model. This means that the model sees an almost equal number of all the classes, which doesn’t reflect the actual data distribution. Because of this, the model becomes less calibrated against the true imbalanced distribution of data. Similarly, algorithm-level cost-sensitive techniques that use class_weight to account for the data imbalance have a similar degraded impact on degrading the calibration of the model against the true data distribution. Figure 10.7 (log scale) from a recent study [7] shows the degrading calibration of a CNN-based model for pneumonia detection task, as class_weight increases from 0.5 to 0.9 to 0.99. The model becomes over-confident and hence less calibrated with the increase in class_weight.

Figure 10.7...

Plotting calibration curves for a model trained on a real-world dataset

Model calibration should ideally be done on a dataset that is separate from the training and test set. Why? It’s to avoid overfitting because the model can become too tailored to the training/test set’s unique characteristics.

We can have a hold-out dataset that has been specifically set aside for model calibration. In some cases, we may have too little data to justify splitting it further into a separate hold-out dataset for calibration. In such cases, a practical compromise might be to use the test set for calibration, assuming that the test set has the same distribution as the dataset on which the model will be used to make final predictions. However, we should keep in mind that after calibrating on the test set, we no longer have an unbiased estimate of the final performance of the model, and we need to be cautious about interpreting the model’s performance metrics.

We use the HR...

Model calibration techniques

There are several ways to calibrate a model. There are two broad categorizations of the calibration techniques based on the nature of the method used to adjust the predicted probabilities to better align with the true probabilities: parametric and non-parametric:

  • Parametric methods: These methods assume a specific functional form for the relationship between the predicted probabilities and the true probabilities. They have a set number of parameters that need to be estimated from the data. Once these parameters are estimated, the calibration function is fully specified. Examples include Platt scaling, which assumes a logistic function, and beta calibration, which assumes a beta distribution. We will also discuss temperature scaling and label smoothing.
  • Non-parametric methods: These methods do not assume a specific functional form for the calibration function. They are more flexible and can adapt to more complex relationships between the predicted...

The impact of calibration on a model’s performance

Accuracy, log-loss, and Brier scores usually improve because of calibration. However, since the model calibration still involves approximately fitting a model to the calibration curve plotted on the held-out calibration dataset, it may sometimes worsen the accuracy or other performance metrics by small amounts. Nevertheless, the benefits of having calibrated probabilities in terms of giving us actual interpretable probability values that represent likelihood far outweigh the slight performance impact.

As discussed in Chapter 1, Introduction to Data Imbalance in Machine Learning, ROC-AUC is a rank-based metric, meaning it evaluates the model’s ability to distinguish between classes based on the ranking of predicted scores rather than their absolute values. ROC-AUC doesn’t make any claim about accurate probability estimates. Strictly monotonic calibration functions, which continuously increase or decrease without...

Summary

In this chapter, we went through the basic concepts of model calibration, why we should care about it, how to measure whether a model is calibrated, how data imbalance affects the model calibration, and, finally, how to calibrate an uncalibrated model. Some of the calibration techniques we talked about include Platt’s scaling, isotonic regression, temperature scaling, and label smoothing.

With this, we come to the end of this book. Thank you for dedicating your time to reading the book. We trust that it has broadened your knowledge of handling imbalanced datasets and their practical applications in machine learning. As we draw this book to a close, we’d like to offer some concluding advice on how to effectively utilize the techniques discussed.

Like other machine learning techniques, the methods discussed in this book can be highly useful under the right conditions, but they also come with their own set of challenges. Recognizing when and where to apply...

Questions

  1. Can a well-calibrated model have low accuracy? What about the reverse: can a model with high accuracy be poorly calibrated?
  2. Take a limited classification dataset with, say, only 100 data points. Train a decision tree model using this dataset and then assess its calibration.
    1. Calibrate the model using Platt’s scaling. Measure the Brier score after calibration.
    2. Calibrate the model using isotonic regression. Measure the Brier score after calibration
    3. How do the Brier scores differ in (A) and (B)?
    4. Measure the AUC, accuracy, precision, recall, and F1 score of the model before and after calibration.
  3. Take a balanced dataset, say with 10,000 points. Train a decision tree model using it. Then check how calibrated it is.
    1. Calibrate the model using Platt’s scaling. Measure the Brier score after calibration.
    2. Calibrate the model using isotonic regression. Measure the Brier score after calibration.
    3. How do the Brier scores differ in (a) and (b)?
    4. Measure the AUC, accuracy...

References

  1. C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On Calibration of Modern Neural Networks.” arXiv, Aug. 03, 2017. Accessed: Nov. 21, 2022, http://arxiv.org/abs/1706.04599
  2. A. Niculescu-Mizil and R. Caruana, “Predicting good probabilities with supervised learning,” in Proceedings of the 22nd International Conference on Machine Learning - ICML ‘05, Bonn, Germany, 2005, pp. 625–632. doi: 10.1145/1102351.1102430.
  3. J. Mukhoti, V. Kulharia, A. Sanyal, S. Golodetz, P. H. S. Torr, and P. K. Dokania, “Calibrating Deep Neural Networks using Focal Loss”. Feb 2020, https://doi.org/10.48550/arXiv.2002.09437
  4. B. C. Wallace and I. J. Dahabreh, “Class Probability Estimates are Unreliable for Imbalanced Data (and How to Fix Them),” in 2012 IEEE 12th International Conference on Data Mining, Brussels, Belgium, Dec. 2012, pp. 695–704. doi: 10.1109/ICDM.2012.115.
  5. M. Pakdaman Naeini, G. Cooper, and...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Machine Learning for Imbalanced Data
Published in: Nov 2023Publisher: PacktISBN-13: 9781801070836
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Kumar Abhishek

Kumar Abhishek is a seasoned Senior Machine Learning Engineer at Expedia Group, US, specializing in risk analysis and fraud detection for Expedia brands. With over a decade of experience at companies such as Microsoft, Amazon, and a Bay Area startup, Kumar holds an MS in Computer Science from the University of Florida.
Read more about Kumar Abhishek

author image
Dr. Mounir Abdelaziz

Dr. Mounir Abdelaziz is a deep learning researcher specializing in computer vision applications. He holds a Ph.D. in computer science and technology from Central South University, China. During his Ph.D. journey, he developed innovative algorithms to address practical computer vision challenges. He has also authored numerous research articles in the field of few-shot learning for image classification.
Read more about Dr. Mounir Abdelaziz