You're reading from Modern Computer Vision with PyTorch

Product typeBook

Published inNov 2020

Reading LevelBeginner

PublisherPackt

ISBN-139781839213472

Edition1st Edition

Languages

Python

Tools

PyTorch

Concepts

Computer Vision

Authors (2):

V Kishore Ayyadevara

Yeshwanth Reddy

View More author details

Understanding the impact of the learning rate

In order to understand how learning rate impacts the training of a model, let's consider a very simple case, where we try to fit the following equation (note that the following equation is different from the toy dataset that we have been working on so far):

Note that y is the output and x is the input. With a set of input and expected output values, we will try and fit the equation with varying learning rates to understand the impact of the learning rate.

The following code is available as Learning_rate.ipynb in the Chapter01 folder of this book's GitHub repository - https://tinyurl.com/mcvp-packt

We specify the input and output dataset as follows:

x = [[1],[2],[3],[4]]
y = [[3],[6],[9],[12]]

Define the feed_forward function. Further, in this instance, we will modify the network in such a way that we do not have a hidden layer and the architecture is as follows:

Note that, in the preceding function, we are estimating the parameters w and b:

from copy import deepcopy
import numpy as np
def feed_forward(inputs, outputs, weights):
    pred_out = np.dot(inputs,weights[0])+ weights[1]
    mean_squared_error = np.mean(np.square(pred_out \
                                           - outputs))
    return mean_squared_error

Define the update_weights function just like we defined it in the Gradient descent in code section:

def update_weights(inputs, outputs, weights, lr):
    original_weights = deepcopy(weights)
    org_loss = feed_forward(inputs, outputs,original_weights)
    updated_weights = deepcopy(weights)
    for i, layer in enumerate(original_weights):
        for index, weight in np.ndenumerate(layer):
            temp_weights = deepcopy(weights)
            temp_weights[i][index] += 0.0001
            _loss_plus = feed_forward(inputs, outputs, \
                                      temp_weights)
            grad = (_loss_plus - org_loss)/(0.0001)
            updated_weights[i][index] -= grad*lr
    return updated_weights

Initialize weight and bias values to a random value:

W = [np.array([[0]], dtype=np.float32), 
     np.array([[0]], dtype=np.float32)]

Note that the weight and bias values are randomly initialized to values of 0. Further, the shape of the input weight value is 1 x 1, as the shape of each data point in the input is 1 x 1 and the shape of the bias value is 1 x 1 (as there is only one node in the output and each output has one value).

Let's leverage the update_weights function with a learning rate of 0.01, loop through 1,000 iterations, and check how the weight value (W) varies over increasing epochs:

weight_value = []
for epx in range(1000):
    W = update_weights(x,y,W,0.01)
    weight_value.append(W[0][0][0])

Note that, in the preceding code, we are using a learning rate of 0.01 and repeating the update_weights function to fetch the modified weight at the end of each epoch. Further, in each epoch, we gave the most recent updated weight as an input to fetch the updated weight in the next epoch.

Plot the value of the weight parameter at the end of each epoch:

import matplotlib.pyplot as plt
%matplotlib inline
epochs = range(1, 1001)
plt.plot(epochs,weight_value)
plt.title('Weight value over increasing \
epochs when learning rate is 0.01')
plt.xlabel('Epochs')
plt.ylabel('Weight value')

The preceding code results in a variation in the weight value over increasing epochs as follows:

Note that, in the preceding output, the weight value gradually increased in the right direction and then saturated at the optimal value of ~3.

In order to understand the impact of the value of the learning rate on arriving at the optimal weight values, let's understand how weight value varies over increasing epochs when the learning rate is 0.1 and when the learning rate is 1.

The following charts are obtained when we modify the corresponding learning rate value in step 5 and execute step 6 (the code to generate the following charts is the same as the code we learned earlier, with a change in the learning rate value, and is available in the associated notebook in GitHub):

Note that when the learning rate was very small (0.01), the weight value moved slowly (over a higher number of epochs) towards the optimal value. However, with a slightly higher learning rate (0.1), the weight value oscillated initially and then quickly saturated (in fewer epochs) to the optimal value. Finally, when the learning rate was high (1), the weight value spiked to a very high value and was not able to reach the optimal value.

The reason the weight value did not spike by a large amount when the learning rate was low is that we restricted the weight update by an amount that was equal to the gradient * learning rate, essentially resulting in a small amount of weight update when the learning rate was small. However, when the learning rate was high, weight update was high, after which the change in loss (when the weight was updated by a small value) was so small that the weight could not achieve the optimal value.

In order to have a deeper understanding of the interplay between the gradient value, learning rate, and weight value, let's run the update_weights function only for 10 epochs. Further, we will print the following values to understand how they vary over increasing epochs:

Weight value at the start of each epoch
Loss prior to weight update
Loss when the weight is updated by a small amount
Gradient value

We modify the update_weights function to print the preceding values as follows:

def update_weights(inputs, outputs, weights, lr):
    original_weights = deepcopy(weights)
    org_loss = feed_forward(inputs, outputs, original_weights)
    updated_weights = deepcopy(weights)
    for i, layer in enumerate(original_weights):
        for index, weight in np.ndenumerate(layer):
            temp_weights = deepcopy(weights)
            temp_weights[i][index] += 0.0001
            _loss_plus = feed_forward(inputs, outputs, \
                                      temp_weights)
            grad = (_loss_plus - org_loss)/(0.0001)
            updated_weights[i][index] -= grad*lr
            if(i % 2 == 0):
                print('weight value:', \
                      np.round(original_weights[i][index],2), \
                      'original loss:', np.round(org_loss,2), \
                      'loss_plus:', np.round(_loss_plus,2), \
                      'gradient:', np.round(grad,2), \
                      'updated_weights:', \
                      np.round(updated_weights[i][index],2))
    return updated_weights

The lines highlighted in bold font in the preceding code are where we modified the update_weights function from the previous section, where, first, we are checking whether we are currently working on the weight parameter by checking if (i % 2 == 0) as the other parameter corresponds to the bias value, and then we are printing the original weight value (original_weights[i][index]), loss (org_loss), updated loss value (_loss_plus), gradient (grad), and the resulting updated weight value (updated_weights).

Let's now understand how the preceding values vary over increasing epochs across the three different learning rates that we are considering:

Learning rate of 0.01: We will check the values using the following code:

W = [np.array([[0]], dtype=np.float32), 
     np.array([[0]], dtype=np.float32)]
weight_value = []
for epx in range(10):
    W = update_weights(x,y,W,0.01)
    weight_value.append(W[0][0][0])
print(W)
import matplotlib.pyplot as plt
%matplotlib inline
epochs = range(1, 11)
plt.plot(epochs,weight_value)
plt.title('Weight value over increasing \
epochs when learning rate is 0.01')
plt.xlabel('Epochs')
plt.ylabel('Weight value')

The preceding code results in the following output:

Note that, when the learning rate was 0.01, the loss value decreased slowly, and also the weight value updated slowly towards the optimal value. Let's now understand how the preceding varies when the learning rate is 0.1.

Learning rate of 0.1: The code remains the same as in the learning rate of 0.01 scenario, however, the learning rate parameter would be 0.1 in this scenario. The output of running the same code with the changed learning rate parameter value is as follows:

Let's contrast the learning rate scenarios of 0.01 and 0.1 – the major difference between the two is as follows:

When the learning rate was 0.01, the weight updated much slower when compared to a learning rate of 0.1 (from 0 to 0.45 in the first epoch when the learning rate is 0.01, to 4.5 when the learning rate is 0.1). The reason for the slower update is the lower learning rate as the weight is updated by the gradient times the learning rate.

In addition to the weight update magnitude, we should note the direction of the weight update:

The gradient is negative when the weight value is smaller than the optimal value while it is positive when the weight value is larger than the optimal value. This phenomenon helps in updating weight values in the right direction.

Finally, we will contrast the preceding with a learning rate of 1:

Learning rate of 1: The code remains the same as in the learning rate of 0.01 scenario, however, the learning rate parameter would be 1 in this scenario. The output of running the same code with the changed learning rate parameter is as follows:

From the preceding diagram, we can see that the weight has deviated to a very high value (as at the end of the first epoch, the weight value is 45, which further deviated to a very large value in later epochs). In addition to that, the weight value moved to a very large amount, so that a small change in the weight value hardly results in a change in the gradient, and hence the weight got stuck at that high value.

In general, it is better to have a low learning rate. This way, the model is able to learn slowly but will adjust the weights towards an optimal value. Typical learning rate parameter values range between 0.0001 and 0.01.

Now that we have learned about the building blocks of a neural network – feedforward propagation, backpropagation, and learning rate, in the next section, we will summarize a high-level overview of how these three are put together to train a neural network.

You have been reading a chapter from

Modern Computer Vision with PyTorch

Published in: Nov 2020Publisher: PacktISBN-13: 9781839213472

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

V Kishore Ayyadevara

V Kishore Ayyadevara leads a team focused on using AI to solve problems in the healthcare space. He has 10 years' experience in data science, solving problems to improve customer experience in leading technology companies. In his current role, he is responsible for developing a variety of cutting edge analytical solutions that have an impact at scale while building strong technical teams. Prior to this, Kishore authored three books — Pro Machine Learning Algorithms, Hands-on Machine Learning with Google Cloud Platform, and SciPy Recipes. Kishore is an active learner with keen interest in identifying problems that can be solved using data, simplifying the complexity and in transferring techniques across domains to achieve quantifiable results.
Read more about V Kishore Ayyadevara

Yeshwanth Reddy

Yeshwanth is a highly accomplished data scientist manager with 9+ years of experience in deep learning and document analysis. He has made significant contributions to the field, including building software for end-to-end document digitization, resulting in substantial cost savings. Yeshwanth's expertise extends to developing modules in OCR, word detection, and synthetic document generation. His groundbreaking work has been recognized through multiple patents. He also created a few Python libraries. With a passion for disrupting unsupervised and self-supervised learning, Yeshwanth is dedicated to reducing reliance on manual annotation and driving innovative solutions in the field of data science.
Read more about Yeshwanth Reddy

Other recommended products

Related to this chapter

Neural Networks with Keras Cookbook

This book presents solutions to the majority of the challenges you will face while training neural networks to solve deep learning problems. It covers the trending deep learning architectures used in industry and tackles a variety of use cases in computer vision, text processing, audio analysis, recommender systems, and game bots

BookFeb 2019568 pages

PyTorch Computer Vision Cookbook

This book enables you to solve the trickiest of problems in computer vision using deep learning algorithms and techniques. You will learn to use several different algorithms for different CV problems such as classification, detection, segmentation, and more using Pytorch. Packed with best practices in training and deployment of CV applications.

BookMar 2020364 pages

PyTorch Artificial Intelligence Fundamentals

In this book, you will start from the basics of tensor manipulation to all the way releasing your deep learning model to production. Using hands-on recipes you will learn to build deep learning applications and visualize the model performance. It teaches you about CNNs, RNNs, GANs and deep reinforcement learning with Pytorch.

BookFeb 2020200 pages

Generative Adversarial Networks Projects

In this book, we will use different complexities of datasets in order to build end-to-end projects. With every chapter, the level of complexity and operations will become advanced. It consists of 8 full-fledged projects covering approaches such as 3D-GAN, Age-cGAN, DCGAN, SRGAN, StackGAN, and CycleGAN with real-world use cases.

BookJan 2019316 pages

Generative Adversarial Networks Cookbook

Generative Adversarial Networks have opened up many new possibilities in the machine learning domain. This book is all you need to implement different types of GANs using TensorFlow and Keras, in order to provide optimized and efficient deep learning solutions.

BookDec 2018268 pages

Python Image Processing Cookbook

Advancements in wireless devices and mobile technology have enabled the acquisition of a tremendous amount of graphics, pictures, and videos. Through cutting edge recipes, this book provides coverage on tools, algorithms, and analysis for image processing. This book provides solutions addressing the challenges and complex tasks of image processing.

BookApr 2020438 pages

Mastering Computer Vision with TensorFlow 2.x

You will learn the principles of computer vision and deep learning, and understand various models and architectures with their pros and cons. You will learn how to use TensorFlow 2.x to build your own neural network model and apply it to various computer vision tasks such as image acquiring, processing, and analyzing.

BookMay 2020430 pages

Hands-On Generative Adversarial Networks with PyTorch 1.x

This book will help you understand how GANs architecture works using PyTorch. You will get familiar with the most flexible deep learning toolkit and use it to transform ideas into actual working codes. You will apply GAN models to areas like computer vision, multimedia and natural language processing using a sample-generation perspective.

BookDec 2019312 pages

Applied Deep Learning with PyTorch

Starting with the basics of deep learning and their various applications, Applied Deep Learning with PyTorch shows you how to solve trending tasks, such as image classification and natural language processing by understanding the different architectures of the neural networks.

BookApr 2019254 pages

Hands-On Deep Learning Algorithms with Python

This book introduces basic-to-advanced deep learning algorithms used in a production environment by AI researchers and principal data scientists; it explains algorithms intuitively, including the underlying math, and shows how to implement them using popular Python-based deep learning libraries such as TensorFlow.

BookJul 2019512 pages

Hands-On Image Generation with TensorFlow

This book is a step-by-step guide to show you how to implement generative models in TensorFlow 2.x from scratch. You’ll get to grips with the image generative technology by covering autoencoders, style transfer, and GANs as well as fundamental and state-of-the-art models.

BookDec 2020306 pages

Deep Learning with PyTorch

This book provides the intuition behind the state of the art Deep Learning architectures such as ResNet, DenseNet, Inception, and encoder-decoder without diving deep into the math of it. It shows how you can implement and use various architectures to solve problems in the area of image classification, language translation and NLP using PyTorch.

BookFeb 2018262 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages