You're reading from Hands-On Mathematics for Deep Learning

Product typeBook

Published inJun 2020

Reading LevelIntermediate

PublisherPackt

ISBN-139781838647292

Edition1st Edition

Languages

Python

Tools

Pandas TensorFlow

Concepts

Mathematical Programming

Author (1)

Jay Dawani

Regularization

In the previous chapter, we learned about (deep) feedforward neural networks and how they are structured. We learned how these architectures can leverage their hidden layers and non-linear activations to learn to perform well on some very challenging tasks, which linear models aren't able to do. We also saw that neural networks tend to overfit to the training data by learning noise in the dataset, which leads to errors in the testing data. Naturally, since our goal is to create models that generalize well, we want to close the gap so that our models perform just as well on both datasets. This is the goal of regularization—to reduce test error, sometimes at the expense of greater training error.

In this chapter, we will cover a variety of methods used in regularization, how they work, and why certain techniques are preferred over others. This includes...

The need for regularization

In previous chapters, we learned how feedforward neural networks are basically a complex function that maps an input to a corresponding target/label by learning the underlying distribution using the training data. We can recall that during training, after an error has been calculated during the forward pass, backpropagation is used to update the parameters in order to reduce the loss and better approximate the data distribution. We also learned about the capacity of neural networks, the bias-variance trade-off, and how neural networks can underfit or overfit to the training data, which prevents it from being able to perform well on unseen data or test data (that is, a generalization error occurs).

Before we get into what exactly regularization is, let's revisit overfitting and underfitting. Neural networks, as we know, are universal function approximators...

Norm penalties

Adding a parameter norm penalty to the objective function is the most classic of the regularization methods. What this does is limit the capacity of the model. This method has been around for several decades and predates the advent of deep learning. We can write this as follows:

Here, . The α value, in the preceding equation, is a hyperparameter that determines how large a regularizing effect the regularizer will have on the regularized cost function. The greater the value of α is, the more regularization is applied, and the smaller it is, the less of an effect regularization has on the cost function.

In the case of neural networks, we only apply the parameter norm penalties to the weights since they control the interaction or relationship between two nodes in successive layers, and we leave the biases as they are since they need less data in comparison...

Early stopping

During training, we know that our neural networks (which have sufficient capacity to learn the training data) have a tendency to overfit to the training data over many iterations, and then they are unable to generalize what they have learned to perform well on the test set. One way of overcoming this problem is to plot the error on the training and test sets at each iteration and analytically look for the iteration where the error from the training and test sets is the closest. Then, we choose those parameters for our model.

Another advantage of this method is that this in no way alters the objective function in the way that parameter norms do, which makes it easy to use and means it doesn't interfere with the network's learning dynamics, which is shown in the following diagram:

However, this approach isn't perfect—it does have a downside...

Dataset augmentation

Deep feedforward networks, as we have learned, are very data-hungry and they use all this data to learn the underlying data distribution so that they can use their gained knowledge to make predictions on unseen data. This is because the more data they see, the more likely it is that what they encounter in the test set will be an interpolation of the distribution they have already learned. But getting a large enough dataset with good-quality labeled data is by no means a simple task (especially for certain problems where gathering data could end up being very costly). A method to circumvent this issue is using data augmentation; that is, generating synthetic data and using it to train our deep neural network.

The way synthetic data generation works is that we use a generative model (more on this in Chapter 12, Generative Models) to learn the underlying distribution...

Dropout

In the preceding section, we learned about applying penalties to the norm of the weights to regularize them, as well as other approaches, such as dataset augmentation and early stopping. However, there is another effective approach that is widely used in practice, known as dropout.

So far, when training neural networks, all the weights have been learned together. However, dropout alters this idea by having the network only learn a fraction of the weights during each iteration. The reason for this is to avoid co-adaptation. This occurs when we train the entire network over all the training data and some connections end up stronger than others, thereby contributing more toward the network's predictive capabilities because the stronger connections overpower the weaker connections, effectively ignoring them. As we train the network with more iterations, some of the...

Adversarial training

Nowadays, neural networks have started to reach human-level accuracy on a number of tasks, and in some, they can be seen to have even surpassed humans. But have they really surpassed humans or does it just seem this way? In production environments, we often have to deal with noisy data, which can cause our model to make incorrect predictions. So, we will now learn about another very important method of regularization—adversarial training.

Before we get into the what and the how of adversarial training, let's take a look at the following diagram:

What we have done, in the preceding diagram, is added in negligible Gaussian noise to the pixels of the original image. To us, the image looks exactly the same, but to a convolutional neural network, it looks entirely different. This is a problem, and it occurs even when our models are perfectly trained...

Summary

In this chapter, we covered a variety of methods that are used to regularize the parameters of a neural network. These methods are very important when it comes to training our models because they help ensure that they can generalize to unseen data by preventing overfitting, thereby performing well on the tasks we want to use them for. In the following chapters, we will learn about different types of neural networks and how each one is best suited for certain types of problems. Each neural network has a form of regularization that it can use to help improve performance.

In the next chapter, we will learn about convolutional neural networks, which are used for computer vision.

The rest of the chapter is locked

You have been reading a chapter from

Hands-On Mathematics for Deep Learning

Published in: Jun 2020Publisher: PacktISBN-13: 9781838647292

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at £13.99/month. Cancel anytime

Author (1)

Jay Dawani

Jay Dawani is a former professional swimmer turned mathematician and computer scientist. He is also a Forbes 30 Under 30 Fellow. At present, he is the Director of Artificial Intelligence at Geometric Energy Corporation (NATO CAGE) and the CEO of Lemurian Labs - a startup he founded that is developing the next generation of autonomy, intelligent process automation, and driver intelligence. Previously he has also been the technology and R&D advisor to Spacebit Capital. He has spent the last three years researching at the frontiers of AI with a focus on reinforcement learning, open-ended learning, deep learning, quantum machine learning, human-machine interaction, multi-agent and complex systems, and artificial general intelligence.
Read more about Jay Dawani

Other recommended products

Related to this chapter

Hands-On Java Deep Learning for Computer Vision

This book will take you through the process of efficiently training deep neural networks in Java for Computer Vision-related tasks. You will build real-world applications ranging from simple Java handwritten digit recognition models to real-time autonomous car driving systems and face recognition models using the popular Java-based libraries.

BookFeb 2019260 pages

Hands-On Meta Learning with Python

This hands-on guide for meta learning starts with exploring the principles, algorithms, and implementations of Meta learning with Tensorflow, Keras, and Python. Once it sets the foundation of "learning to learn", the book will help you implement your meta learning algorithms from scratch.

BookDec 2018226 pages

Hands-On One-shot Learning with Python

This book is a step by step guide to one-shot learning using Python-based libraries. It is designed to help you understand and design models that can learn information about your data from one, or only a few, training examples. You will also learn to apply these techniques with real-world examples and datasets for classification and regression.

BookApr 2020156 pages

Hands-On Deep Learning Algorithms with Python

This book introduces basic-to-advanced deep learning algorithms used in a production environment by AI researchers and principal data scientists; it explains algorithms intuitively, including the underlying math, and shows how to implement them using popular Python-based deep learning libraries such as TensorFlow.

BookJul 2019512 pages

Practical Convolutional Neural Networks

This book helps you master CNN, from the basics to the most advanced concepts in CNN such as GANs, instance classification and attention mechanism for vision models and more. You will implement advanced CNN models using complex image and video datasets. By the end of the book you will learn CNN’s best practices to implement smart ConvNet models and apply them to solve complex deep learning problems.

BookFeb 2018218 pages

Mastering Computer Vision with TensorFlow 2.x

You will learn the principles of computer vision and deep learning, and understand various models and architectures with their pros and cons. You will learn how to use TensorFlow 2.x to build your own neural network model and apply it to various computer vision tasks such as image acquiring, processing, and analyzing.

BookMay 2020430 pages

Applying Math with Python

Python has a number of powerful packages to help anyone tackle complex mathematical problems in a simple and efficient way. This practical guide explains how to model real-world problems as mathematical objects in Python and how to perform computations, and interpret results. It explores Python lang to solve a variety of math and statistics problems.

BookJul 2020358 pages

Advanced Deep Learning with Python

This book is an expert-level guide to master the neural network variants using the Python ecosystem. You will gain the skills to build smarter, faster, and efficient deep learning systems with practical examples. By the end of this book, you will be up to date with the latest advances and current researches in the deep learning domain.

BookDec 2019468 pages

Deep Learning with PyTorch Quick Start Guide

PyTorch is extremely powerful and yet easy to learn. It provides advanced features such as supporting multiprocessor, distributed and parallel computation. This book is an excellent entry point for those wanting to explore deep learning with PyTorch to harness its power.

BookDec 2018158 pages

Practical Computer Vision

Computer Vision is a broadly used term associated with acquiring, processing, and analyzing images. This book will show you how you can perform various Computer Vision techniques in the most practical way possible. Right from capturing images from various sources, you will learn how to perform image filtering/manipulation and detect features in your images. As you go through the chapters, you'll work with increasingly complex algorithms to develop complex Computer Vision applications

BookFeb 2018234 pages

Hands-On Generative Adversarial Networks with Keras

This book will explore deep learning and generative models, and their applications in artificial intelligence. You will learn to evaluate and improve your GAN models by eliminating challenges that are encountered in real-world applications. You will implement GAN architectures in various domains such as computer vision, NLP, and audio processing

BookMay 2019272 pages

Hands-On Deep Learning Architectures with Python

This book explains the essential learning algorithms used for deep and shallow architectures. Packed with practical implementations to help you understand the concepts and ideas required to build efficient artificial intelligence systems, this book will help you construct deep models using popular frameworks and datasets.

BookApr 2019316 pages

Personalised recommendations for you

Based on your interests and search pattern

C++ Programming for Linux Systems

This book covers the essential system programming tools and helps you explore the features of C++20. It emphasizes important details to maintain code quality and tackle everyday challenges of developing software for high performance, optimization, and more.

BookSep 2023288 pages

Expert C++

Discover advanced programming techniques, the latest features of C++17 and C++20, and best practices for memory management, debugging, testing, and large-scale application design with Expert C++. Ideal for experienced developers advancing to proficient programmers and building professional-grade C++ applications.

BookAug 2023604 pages

iOS 17 Programming for Beginners

iOS 17 Programming for Beginners, Eighth Edition is your comprehensive guide to learning the art of iOS app development. Whether you dream of creating the next chart-topping app or simply want to enhance your programming skills, this book is your trusted companion on this exciting journey.

BookOct 2023604 pages4

Developer Career Masterplan

Written by industry experts that have spent the last 20+ years helping developers grow their career path towards senior developer positions and beyond. This book provides a comprehensive guide, sharing examples and stories from their global careers. By the end, you’ll have the knowledge to create a clear career progression plan as a technical professional.

BookSep 2023310 pages

Refactoring with C#

In Refactoring with C#, you’ll explore the process of safely refactoring modern .NET code using Visual Studio features, advanced unit tests, AI assistance, and custom Roslyn analyzers.

BookNov 2023434 pages

Python Real-World Projects

Amplify your developer journey by curating a dynamic project portfolio that outshines traditional resumes. Delve into the Python realm through immersive projects, mastering core concepts while constructing comprehensive modules and applications. From data acquisition prowess to impactful data visualization, Python Real-World Projects arms you with essential skills to beat the competition.

BookSep 2023478 pages5

The MVVM Pattern in .NET MAUI

The MVVM Pattern in .NET MAUI enables developers to master MVVM principles and effectively apply them to .NET MAUI. This book uses real-life examples and covers complex problems to help you successfully apply MVVM with .NET MAUI to confidently develop robust and high-performing cross-platform apps.

BookNov 2023386 pages

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Quantum Computing Algorithms

The book emphasizes intuitive ideas behind quantum algorithms in ways that other books don’t cover, striking a careful balance between no math and too much math. To get the most from this book, you should be comfortable with basic algebra and writing simple computer code. No prior understanding of quantum physics is needed to get started.

BookSep 2023342 pages

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5