Home Data The Regularization Cookbook

The Regularization Cookbook

By Vincent Vandenbussche
books-svg-icon Book
Subscription FREE
eBook $47.99
Print + eBook $59.99
READ FOR FREE Free Trial for 7 days. $15.99 p/m after trial. Cancel Anytime! BUY NOW BUY NOW
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
READ FOR FREE Free Trial for 7 days. $15.99 p/m after trial. Cancel Anytime! BUY NOW BUY NOW
Subscription FREE
eBook $47.99
Print + eBook $59.99
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
  1. Free Chapter
    Chapter 2: Machine Learning Refresher
About this book
Regularization is an infallible way to produce accurate results with unseen data, however, applying regularization is challenging as it is available in multiple forms and applying the appropriate technique to every model is a must. The Regularization Cookbook provides you with the appropriate tools and methods to handle any case, with ready-to-use working codes as well as theoretical explanations. After an introduction to regularization and methods to diagnose when to use it, you’ll start implementing regularization techniques on linear models, such as linear and logistic regression, and tree-based models, such as random forest and gradient boosting. You’ll then be introduced to specific regularization methods based on data, high cardinality features, and imbalanced datasets. In the last five chapters, you’ll discover regularization for deep learning models. After reviewing general methods that apply to any type of neural network, you’ll dive into more NLP-specific methods for RNNs and transformers, as well as using BERT or GPT-3. By the end, you’ll explore regularization for computer vision, covering CNN specifics, along with the use of generative models such as stable diffusion and Dall-E. By the end of this book, you’ll be armed with different regularization techniques to apply to your ML and DL models.
Publication date:
July 2023
Publisher
Packt
Pages
424
ISBN
9781837634088

 

An Overview of Regularization

Let’s embark on a journey into the world of regularization in machine learning. I hope you will learn a lot and find as much joy in reading this book as I did in writing it.

Regularization is important for any individual willing to deploy robust machine learning (ML) models.

This chapter will introduce some context and key concepts about regularization before diving deeper into it in the next chapters. At this point, you may have many questions about this book and about regularization in general. What is regularization? Why do we need regularization for production-grade ML models? How do we diagnose the need for regularization? What are the limits of regularization? What are the approaches to regularization?

All the foundational knowledge about regularization will be provided in this chapter in the hope of answering all these questions. Not only will this give you a high-level understanding of what regularization is but it will also allow...

 

Technical requirements

In this chapter, you will have the opportunity to generate a toy dataset, display it, and train basic linear regression on that data. Therefore, the following Python libraries will be required:

  • NumPy
  • Matplotlib
  • scikit-learn
 

Introducing regularization

“Regularization in ML is a technique used to improve the generalization performance of a model by adding additional constraints to the model’s parameters. This forces the model to use simpler representations and helps reduce the risk of overfitting.

Regularization can also help improve the performance of a model on unseen data by encouraging the model to learn more relevant, generalizable features.”

This definition of regularization, arguably good enough, was actually generated by the famous GPT-3 model when given the following prompt: Detailed definition of regularization in machine learning. Even more astonishing, this definition passed several plagiarism tests, meaning it’s actually fully original text. Do not worry if you do not yet understand all the words in this definition from GPT-3; it is not meant for beginners. But you will fully understand it by the end of this chapter.

Note

GPT-3, short for Generative Pre...

 

Key concepts of regularization

Having gained some intuition regarding what constitutes a suitable fit, as well as understanding examples of underfitting and overfitting, let us now delve into a more precise definition and explore key concepts that enable us to better comprehend regularization.

Bias and variance

Bias and variance are two key concepts when talking about regularization. We can define two main kinds of errors a model can have:

  • Bias is how bad a model is at capturing the general behavior of the data
  • Variance is how bad a model is at being robust to small input data fluctuations

Those two concepts, in general, are not mutually exclusive. If we take a step back from ML, there is a very common figure to visualize bias and variance, assuming the model’s goal is to hit the center of a target:

Figure 1.7 – Visualization of bias and variance

Figure 1.7 – Visualization of bias and variance

Let’s describe those four cases:

  • High bias and low variance...
 

Regularization – a multi-dimensional problem

Having the right diagnosis for a model is crucial, as it allows us to choose the strategy more carefully to improve the model. But from any diagnosis, many paths are possible to improve the model. Those paths can be separated into three main categories, as proposed in the following figure:

Figure 1.17 – A proposed categorization of regularization types: data, model architecture, and model training

Figure 1.17 – A proposed categorization of regularization types: data, model architecture, and model training

At the data level, we may have the following tools for regularization:

  • Adding more data, either synthetic or real
  • Adding more features
  • Feature engineering
  • Data preprocessing

Indeed, the data is of extreme importance in ML in general, and regularization is no exception. We will see many examples throughout the book of regularizing data.

At the model level, the following methods may be used for regularization:

  • Choosing a more or less simple architecture
  • In deep...
 

Summary

We started this chapter by demonstrating, with several real-world examples, that regularization is the key to success in ML in a production environment. Along with several other methods and best practices, a robustly regularized model is necessary for production. In production, unseen data and edge cases will appear on a regular basis, thus any deployed model must have an acceptable response to such cases.

We then walked through some key concepts of regularization. Overfitting and underfitting are two common problems in ML and relate somehow to bias and variance. Indeed, an overfitting model has high variance, while an underfitting model has high bias. Thus, to perform well, a model is required to have low bias and low variance. We explained how, no matter how good a model can get, unavoidable bias limits its performance. Those key concepts allowed us to propose a method to diagnose bias and variance using the performance of both the training and validation sets, as well...

About the Author
  • Vincent Vandenbussche

    After a Ph.D. in Physics, Vincent Vandenbussche has worked for a decade in the industry, deploying ML solutions at scale. He has worked in numerous companies, such as Renault, L’Oréal, General Electric, Jellysmack, Chanel, and CERN. He also has a passion for teaching: he co-founded a data science bootcamp, was an ML lecturer at Mines Paris engineering school and EDHEC business school and trained numerous professionals in companies like ArcelorMittal and Orange.

    Browse publications by this author
The Regularization Cookbook
Unlock this book and the full library FREE for 7 days
Start now