Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Modern Computer Vision with PyTorch

You're reading from  Modern Computer Vision with PyTorch

Product type Book
Published in Nov 2020
Publisher Packt
ISBN-13 9781839213472
Pages 824 pages
Edition 1st Edition
Languages
Authors (2):
V Kishore Ayyadevara V Kishore Ayyadevara
Profile icon V Kishore Ayyadevara
Yeshwanth Reddy Yeshwanth Reddy
Profile icon Yeshwanth Reddy
View More author details

Table of Contents (25) Chapters

Preface Section 1 - Fundamentals of Deep Learning for Computer Vision
Artificial Neural Network Fundamentals PyTorch Fundamentals Building a Deep Neural Network with PyTorch Section 2 - Object Classification and Detection
Introducing Convolutional Neural Networks Transfer Learning for Image Classification Practical Aspects of Image Classification Basics of Object Detection Advanced Object Detection Image Segmentation Applications of Object Detection and Segmentation Section 3 - Image Manipulation
Autoencoders and Image Manipulation Image Generation Using GANs Advanced GANs to Manipulate Images Section 4 - Combining Computer Vision with Other Techniques
Training with Minimal Data Points Combining Computer Vision and NLP Techniques Combining Computer Vision and Reinforcement Learning Moving a Model to Production Using OpenCV Utilities for Image Analysis Other Books You May Enjoy Appendix
Advanced Object Detection

In the previous chapter, we learned about R-CNN and Fast R-CNN techniques, which leveraged region proposals to generate predictions of the locations of objects in an image along with the classes corresponding to objects in the image. Furthermore, we learned about the bottleneck of the speed of inference, which happens because of having two different models – one for region proposal generation and another for object detection. In this chapter, we will learn about different modern techniques, such as Faster R-CNN, YOLO, and Single-Shot Detector (SSD), that overcome slow inference time by employing a single model to make predictions for both the class of object and the bounding box in a single shot. We will start by learning about anchor boxes and then proceed to learn about how each of the techniques works and how to implement them to detect objects...

Components of modern object detection algorithms

The drawback of the R-CNN and Fast R-CNN techniques is that they have two disjointed networks – one to identify the regions that likely contain an object and the other to make corrections to the bounding box where an object is identified. Furthermore, both the models require as many forward propagations as there are region proposals. Modern object detection algorithms focus heavily on training a single neural network and have the capability to detect all objects in one forward pass. In the subsequent sections, we will learn about the various components of a typical modern object detection algorithm:

  • Anchor boxes
  • Region proposal network (RPN)
  • Region of interest pooling

Anchor boxes

So far, we have had region proposals coming from the selectivesearch method. Anchor boxes come in as a handy replacement for selective search – we will learn how they replace selectivesearch-based region proposals in this section.

Typically, a...

Training Faster R-CNN on a custom dataset

In the following code, we will train the Faster R-CNN algorithm to detect the bounding boxes around objects present in images. For this, we will work on the same truck versus bus detection exercise that we worked on in the previous chapter:

The following code is available as Training_Faster_RCNN.ipynb in the Chapter08 folder of this book's GitHub repository - https://tinyurl.com/mcvp-packt.
  1. Download the dataset:
import os
if not os.path.exists('images'):
!pip install -qU torch_snippets
from google.colab import files
files.upload() # upload kaggle.json
!mkdir -p ~/.kaggle
!mv kaggle.json ~/.kaggle/
!ls ~/.kaggle
!chmod 600 /root/.kaggle/kaggle.json
!kaggle datasets download \
-d sixhky/open-images-bus-trucks/
!unzip -qq open-images-bus-trucks.zip
!rm open-images-bus-trucks.zip
  1. Read the DataFrame containing metadata of information about images and their bounding box, and classes:
from torch_snippets...

Working details of YOLO

You Only Look Once (YOLO) and its variants are one of the prominent object detection algorithms. In this section, we will understand at a high level how YOLO works and the potential limitations of R-CNN-based object detection frameworks that YOLO overcomes.

First, let's learn about the possible limitations of R-CNN-based detection algorithms. In Faster R-CNN, we slide over the image using anchor boxes and identify the regions that are likely to contain an object, and then we make the bounding box corrections. However, in the fully connected layer, where only the detected region's RoI pooling output is passed as input, in the case of regions that do not fully encompass the object (where the object is beyond the boundaries of the bounding box of region proposal), the network has to guess the real boundaries of object, as it has not seen the full image (but has seen only the region proposal).

YOLO comes in handy in such scenarios, as it looks at the whole...

Training YOLO on a custom dataset

Building on top of others' work is very important to becoming a successful practitioner in deep learning. For this implementation, we will use the official YOLO-v4 implementation to identify the location of buses and trucks in images. We will clone the repository of the authors' own implementation of YOLO and customize it to our needs in the following code.

The following code is available as Training_YOLO.ipynb in the Chapter08 folder of this book's GitHub repository - https://tinyurl.com/mcvp-packt.

Installing Darknet

First, pull the darknet repository from GitHub and compile it in the environment. The model is written in a separate language called Darknet, which is different from PyTorch. We will do so using the following code:

  1. Pull the Git repo:
!git clone https://github.com/AlexeyAB/darknet
%cd darknet
  1. Reconfigure the Makefile file:
!sed -i 's/OPENCV=0/OPENCV=1/' Makefile
# In case you dont have a GPU, make sure to comment...

Working details of SSD

So far, we have seen a scenario where we made predictions after gradually convolving and pooling the output from the previous layer. However, we know that different layers have different receptive fields to the original image. For example, the initial layers have a smaller receptive field when compared to the final layers, which have a larger receptive field. Here, we will learn how SSD leverages this phenomenon to come up with a prediction of bounding boxes for images.

The workings behind how SSD helps overcome the issue of detecting objects with different scales is as follows:

  • We leverage the pre-trained VGG network and extend it with a few additional layers until we obtain a 1 x 1 block.
  • Instead of leveraging only the final layer for bounding box and class predictions, we will leverage all of the last few layers to make class and bounding box predictions.
  • In place of anchor boxes, we will come up with default boxes that have a specific set of scale and aspect...

Training SSD on a custom dataset

In the following code, we will train the SSD algorithm to detect the bounding boxes around objects present in images. We will use the truck versus bus object detection task we have been working on:

The following code is available as Training_SSD.ipynb in the Chapter08 folder of this book's GitHub repository - https://tinyurl.com/mcvp-packt The code contains URLs to download data from and is moderately lengthy. We strongly recommend you to execute the notebook in GitHub to reproduce results while you understand the steps to perform and explanation of various code components from text.
  1. Download the image dataset and clone the Git repository hosting the code for the model and the other utilities for processing the data:
import os
if not os.path.exists('open-images-bus-trucks'):
!pip install -q torch_snippets
!wget --quiet https://www.dropbox.com/s/agmzwk95v96ihic/\
open-images-bus-trucks.tar.xz
!tar -xf open-images-bus-trucks.tar...

Summary

In this chapter, we have learned about the working details of modern object detection algorithms: Faster R-CNN, YOLO, and SSD. We learned how they overcome the limitation of having two separate models – one for fetching region proposals and the other for fetching class and bounding box offsets on region proposals. Furthermore, we implemented Faster R-CNN using PyTorch, YOLO using darknet, and SSD from scratch.

In the next chapter, we will learn about image segmentation, which goes one step beyond object localization by identifying the pixels that correspond to an object.

Furthermore, in Chapter 15, Combining Computer Vision and NLP Techniques, we will learn about DETR, a transformer-based object detection algorithm, and in Chapter 10, Applications of Object Detection, and Segmentation, we will learn about the Detectron2 framework, which helps in not only detecting objects but also segmenting them in a single shot.

Test your understanding

  1. Why is Faster R-CNN faster when compared to Fast R-CNN?
  2. How are YOLO and SSD faster when compared to Faster R-CNN?
  3. What makes YOLO and SSD single-shot algorithms?
  4. What is the difference between the objectness score and the class score?
  5. What is the difference between an anchor box and a default box?
lock icon The rest of the chapter is locked
You have been reading a chapter from
Modern Computer Vision with PyTorch
Published in: Nov 2020 Publisher: Packt ISBN-13: 9781839213472
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}