You're reading from Modern Computer Vision with PyTorch

Product typeBook

Published inNov 2020

Reading LevelBeginner

PublisherPackt

ISBN-139781839213472

Edition1st Edition

Languages

Python

Tools

PyTorch

Concepts

Computer Vision

Authors (2):

V Kishore Ayyadevara

Yeshwanth Reddy

View More author details

Transfer Learning for Image Classification

In the previous chapter, we learned that, as the number of images available in the training dataset increased, the classification accuracy of the model kept on increasing, to the extent where a training dataset comprising 8,000 images had a higher accuracy on validation dataset than a training dataset comprising 1,000 images. However, we do not always have the option of hundreds or thousands of images, along with the ground truths of their corresponding classes, in order to train a model. This is where transfer learning comes to the rescue.

Transfer learning is a technique where we transfer the learning of the model on a generic dataset to the specific dataset of interest. Typically, the pre-trained models used to perform transfer learning are trained on millions of images (which are generic and not the dataset of interest to us) and...

Introducing transfer learning

Transfer learning is a technique where knowledge gained from one task is leveraged to solve another similar task.

Imagine a model that is trained on millions of images that span thousands of classes of objects (not just cats and dogs). The various filters (kernels) of the model would activate for a wide variety of shapes, colors, and textures within the images. Those filters can now be reused to learn features on a new set of images. Post learning the features, they can be connected to a hidden layer prior to the final classification layer for customizing on the new data.

ImageNet (http://www.image-net.org/) is a competition hosted to classify approximately 14 million images into 1,000 different classes. It has a variety of classes in the dataset, including Indian elephant, lionfish, hard disk, hair spray, and jeep.

The deep neural network architectures that we will go through in this chapter have been trained on the ImageNet dataset. Furthermore, given...

Understanding VGG16 architecture

VGG stands for Visual Geometry Group, which is based out of the University of Oxford, and 16 stands for the number of layers in the model. The VGG16 model is trained to classify objects in the ImageNet competition and stood as the runner-up architecture in 2014. The reason we are studying this architecture instead of the winning architecture (GoogleNet) is because of its simplicity and a larger acceptance in the vision community by using it in several other tasks. Let's understand the architecture of VGG16 along with how a VGG16 pre-trained model is accessible and represented in PyTorch.

The code for this section is available as VGG_architecture.ipynb in the Chapter05 folder of this book's GitHub repository - https://tinyurl.com/mcvp-packt

Install the required packages:

import torchvision
import torch.nn as nn
import torch
import torch.nn.functional as F
from torchvision import transforms,models,datasets
!pip install torch_summary
from torchsummary...

Understanding ResNet architecture

While building too deep a network, there are two problems. In forward propagation, the last few layers of the network have almost no information about what the original image was. In backpropagation, the first few layers near the input hardly get any gradient updates due to vanishing gradients (in other words, they are almost zero). To solve both problems, residual networks (ResNet) use a highway-like connection that transfers raw information from the previous few layers to the later layers. In theory, even the last layer will have the entire information of the original image due to this highway network. And because of the skipping layers, the backward gradients will flow freely to the initial layers with little modification.

The term residual in the residual network is the additional information that the model is expected to learn from the previous layer that needs to be passed on to the next layer.

A typical residual block appears as follows:

As you...

Implementing facial key point detection

So far, we have learned about predicting classes that are binary (cats versus dogs) or are multi-label (fashionMNIST). Let's now learn a regression problem and, in so doing, a task where we are predicting not one but several continuous outputs. Imagine a scenario where you are asked to predict the key points present on an image of a face, for example, the location of the eyes, nose, and chin. In this scenario, we need to employ a new strategy to build a model to detect the key points.

Before we dive further, let's understand what we are trying to achieve through the following image:

As you can observe in the preceding image, facial key points denote the markings of various key points on the image that contains a face.

To solve this problem, we would have to solve a few problems first:

Images can be of different shapes:
This warrants an adjustment in the key point locations while adjusting images to bring them all to a standard image...

Multi-task learning – Implementing age estimation and gender classification

Multi-task learning is a branch of research where a single/few inputs are used to predict several different but ultimately connected outputs. For example, in a self-driving car, the model needs to identify obstacles, plan routes, give the right amount of throttle/brake and steering, to name but a few. It needs to do all of these in a split second by considering the same set of inputs (which would come from several sensors)

From the various use cases we have solved so far, we are in a position to train a neural network and estimate the age of a person given an image or predict the gender of the person given an image, separately, one task at a time. However, we have not looked at a scenario where we will be able to predict both age and gender in a single shot from an image. Predicting two different attributes in a single shot is important, as the same image is used for both predictions (this will be further...

Introducing the torch_snippets library

As you may have noticed, we are using the same functions in almost all the sections. It is a waste of our time to write the same lines of functions again and again. For convenience, the authors of this book have written a Python library by the name of torch_snippets so that our code looks short and clean.

Utilities such as reading an image, showing an image, and the entire training loop are quite repetitive. We want to circumvent writing the same functions over and over by wrapping them in code that is preferably a single function call. For example, to read a color image, we need not write cv2.imread(...) followed by cv2.cvtColor(...) every time. Instead, we can simply call read(...). Similarly, for plt.imshow(...), there are numerous hassles, including the fact that the size of the image should be optimal, and that the channel dimension should be last (remember PyTorch has them first). These will always be taken care of by the single function, show...

Summary

In this chapter, we have learned about how transfer learning helps to achieve high accuracy, even with a smaller number of data points. We have also learned about the popular pre-trained models, VGG and ResNet. Furthermore, we understood how to build models when we are trying to predict different scenarios, such as the location of key points on a face and combining loss values while training a model to predict for both age and gender together, where age is of a certain data type and gender is of a different data type.

With this foundation of image classification through transfer learning, in the next chapter, we will learn about some of the practical aspects of training an image classification model. We will learn about how to explain a model and also learn the tricks of how to train a model to achieve high accuracy and finally, learn the pitfalls that a practitioner needs to avoid while implementing a trained model.

Questions

What are VGG and ResNet pre-trained architectures trained on?
Why does VGG11 have an inferior accuracy to VGG16?
What does the number 11 in VGG11 represent?
What is residual in the residual network?
What is the advantage of a residual network?
What are the various popular pre-trained models?
During transfer learning, why should images be normalized with the same mean and standard deviation as those that were used during the training of a pre-trained model?
Why do we freeze certain parameters in a model?
How do we know the various modules that are present in a pre-trained model?
How do we train a model that predicts categorical and numerical values together?
Why might age and gender prediction code not always work for an image of your own interest if we execute the same code as we wrote in the age and gender estimation section?
How can we further improve the accuracy of the facial keypoint recognition model that we wrote about in the facial key points prediction section?

...

The rest of the chapter is locked

You have been reading a chapter from

Modern Computer Vision with PyTorch

Published in: Nov 2020Publisher: PacktISBN-13: 9781839213472

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

V Kishore Ayyadevara

V Kishore Ayyadevara leads a team focused on using AI to solve problems in the healthcare space. He has 10 years' experience in data science, solving problems to improve customer experience in leading technology companies. In his current role, he is responsible for developing a variety of cutting edge analytical solutions that have an impact at scale while building strong technical teams. Prior to this, Kishore authored three books — Pro Machine Learning Algorithms, Hands-on Machine Learning with Google Cloud Platform, and SciPy Recipes. Kishore is an active learner with keen interest in identifying problems that can be solved using data, simplifying the complexity and in transferring techniques across domains to achieve quantifiable results.
Read more about V Kishore Ayyadevara

Yeshwanth Reddy

Yeshwanth is a highly accomplished data scientist manager with 9+ years of experience in deep learning and document analysis. He has made significant contributions to the field, including building software for end-to-end document digitization, resulting in substantial cost savings. Yeshwanth's expertise extends to developing modules in OCR, word detection, and synthetic document generation. His groundbreaking work has been recognized through multiple patents. He also created a few Python libraries. With a passion for disrupting unsupervised and self-supervised learning, Yeshwanth is dedicated to reducing reliance on manual annotation and driving innovative solutions in the field of data science.
Read more about Yeshwanth Reddy

Other recommended products

Related to this chapter

Neural Networks with Keras Cookbook

This book presents solutions to the majority of the challenges you will face while training neural networks to solve deep learning problems. It covers the trending deep learning architectures used in industry and tackles a variety of use cases in computer vision, text processing, audio analysis, recommender systems, and game bots

BookFeb 2019568 pages

PyTorch Computer Vision Cookbook

This book enables you to solve the trickiest of problems in computer vision using deep learning algorithms and techniques. You will learn to use several different algorithms for different CV problems such as classification, detection, segmentation, and more using Pytorch. Packed with best practices in training and deployment of CV applications.

BookMar 2020364 pages

PyTorch Artificial Intelligence Fundamentals

In this book, you will start from the basics of tensor manipulation to all the way releasing your deep learning model to production. Using hands-on recipes you will learn to build deep learning applications and visualize the model performance. It teaches you about CNNs, RNNs, GANs and deep reinforcement learning with Pytorch.

BookFeb 2020200 pages

Generative Adversarial Networks Projects

In this book, we will use different complexities of datasets in order to build end-to-end projects. With every chapter, the level of complexity and operations will become advanced. It consists of 8 full-fledged projects covering approaches such as 3D-GAN, Age-cGAN, DCGAN, SRGAN, StackGAN, and CycleGAN with real-world use cases.

BookJan 2019316 pages

Generative Adversarial Networks Cookbook

Generative Adversarial Networks have opened up many new possibilities in the machine learning domain. This book is all you need to implement different types of GANs using TensorFlow and Keras, in order to provide optimized and efficient deep learning solutions.

BookDec 2018268 pages

Python Image Processing Cookbook

Advancements in wireless devices and mobile technology have enabled the acquisition of a tremendous amount of graphics, pictures, and videos. Through cutting edge recipes, this book provides coverage on tools, algorithms, and analysis for image processing. This book provides solutions addressing the challenges and complex tasks of image processing.

BookApr 2020438 pages

Mastering Computer Vision with TensorFlow 2.x

You will learn the principles of computer vision and deep learning, and understand various models and architectures with their pros and cons. You will learn how to use TensorFlow 2.x to build your own neural network model and apply it to various computer vision tasks such as image acquiring, processing, and analyzing.

BookMay 2020430 pages

Hands-On Generative Adversarial Networks with PyTorch 1.x

This book will help you understand how GANs architecture works using PyTorch. You will get familiar with the most flexible deep learning toolkit and use it to transform ideas into actual working codes. You will apply GAN models to areas like computer vision, multimedia and natural language processing using a sample-generation perspective.

BookDec 2019312 pages

Applied Deep Learning with PyTorch

Starting with the basics of deep learning and their various applications, Applied Deep Learning with PyTorch shows you how to solve trending tasks, such as image classification and natural language processing by understanding the different architectures of the neural networks.

BookApr 2019254 pages

Hands-On Deep Learning Algorithms with Python

This book introduces basic-to-advanced deep learning algorithms used in a production environment by AI researchers and principal data scientists; it explains algorithms intuitively, including the underlying math, and shows how to implement them using popular Python-based deep learning libraries such as TensorFlow.

BookJul 2019512 pages

Hands-On Image Generation with TensorFlow

This book is a step-by-step guide to show you how to implement generative models in TensorFlow 2.x from scratch. You’ll get to grips with the image generative technology by covering autoencoders, style transfer, and GANs as well as fundamental and state-of-the-art models.

BookDec 2020306 pages

Deep Learning with PyTorch

This book provides the intuition behind the state of the art Deep Learning architectures such as ResNet, DenseNet, Inception, and encoder-decoder without diving deep into the math of it. It shows how you can implement and use various architectures to solve problems in the area of image classification, language translation and NLP using PyTorch.

BookFeb 2018262 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages