You're reading from Hands-On Computer Vision with Detectron2

Product type Book

Published in Apr 2023

Publisher Packt

ISBN-13 9781800561625

Pages 318 pages

Edition 1st Edition

Languages

Python

Concepts

Computer Vision

Author (1):

Van Vung Pham

Table of Contents (20) Chapters

Preface

Part 1: Introduction to Detectron2

Chapter 1: An Introduction to Detectron2 and Computer Vision Tasks

Chapter 2: Developing Computer Vision Applications Using Existing Detectron2 Models

Part 2: Developing Custom Object Detection Models

Chapter 3: Data Preparation for Object Detection Applications

Chapter 4: The Architecture of the Object Detection Model in Detectron2

Chapter 5: Training Custom Object Detection Models

Chapter 6: Inspecting Training Results and Fine-Tuning Detectron2’s Solvers

Chapter 7: Fine-Tuning Object Detection Models

Chapter 8: Image Data Augmentation Techniques

Chapter 9: Applying Train-Time and Test-Time Image Augmentations

Part 3: Developing a Custom Detectron2 Model for Instance Segmentation Tasks

Chapter 10: Training Instance Segmentation Models

Chapter 11: Fine-Tuning Instance Segmentation Models

Part 4: Deploying Detectron2 Models into Production

Chapter 12: Deploying Detectron2 Models into Server Environments

Chapter 13: Deploying Detectron2 Models into Browsers and Mobile Environments

Index

Why subscribe?

Other Books You May Enjoy

The Architecture of the Object Detection Model in Detectron2

This chapter dives deep into the architecture of Detectron2 for the object detection task. The object detection model in Detectron2 is the implementation of Faster R-CNN. Specifically, this architecture includes the backbone network, the region proposal network, and the region of interest heads. This chapter is essential for understanding common terminology when designing deep neural networks for vision systems. Deep understanding helps to fine-tune and customize models for better accuracy while training with the custom datasets.

By the end of this chapter, you will understand Detectron2’s typical architecture in detail. You also know where to customize your Detectron2 model (what configuration parameters to set, how to set them, and where to add/remove layers) to improve performance. Specifically, this chapter covers the following topics:

Introduction to the application architecture
The backbone network...

Technical requirements

You should have set up the development environment with the instructions provided in Chapter 1. Thus, if you still need to do so, please complete setting up the development environment before continuing. Additionally, you should read Chapter 2 to understand Detectron2’s Model Zoo and what backbone networks are available. All the code, datasets, and results are available on the GitHub page of the book (in the folder named Chapter04) at https://github.com/PacktPublishing/Hands-On-Computer-Vision-with-Detectron2. It is highly recommended that you download the code and follow along.

Introduction to the application architecture

As discussed in Chapter 1 and shown in Figure 4.1, Detectron2 has the architecture with the backbone network, the region proposal network, and the region of interest heads.

Figure 4.1: The main components of Detectron2

The backbone network includes several convolutional layers that help to perform feature extraction from the input image. The region proposal network is another neural network that predicts the proposals with objectness and locations of the objects before feeding to the next stage. The region of interest heads have neural networks for object localization and classification. However, the implementation details of Detectron2 are more involved. We should understand this architecture in depth to know what Detectron2 configurations to set and how to fine-tune its model.

Figure 4.2: Architecture of Detectron2’s implementation of Faster R-CNN

Detectron2’s architecture...

The backbone network

Chapter 2 discusses the typical backbone networks for Detectron2. They include ResNet50, ResNet101, ResNeXt101, and their variants. This section inspects the ResNet50 architecture as an example. However, the idea remains the same for other base models (backbone networks). Figure 4.3 summarizes the steps to inspect the backbone network. Specifically, we pass a tensor of data for a single image to the backbone, and the backbone (ResNet50, in this case) gives out a tensor. This output tensor is the extracted salient feature of the input image.

Figure 4.3: The backbone network

Specifically, from the default Detectron2’s predictor, we can access the backbone network using the following code snippet:

backbone = predictor.model.backbone
type(backbone)

This code snippet should print out the following:

detectron2.modeling.backbone.resnet.ResNet

The following code snippet reveals the backbone’s architecture:

print...

Region Proposal Network

Faster R-CNN is called a two-stage technique. The first stage proposes the regions (bounding boxes) and whether an object falls within that region (objectness). Notably, at this stage, it only predicts whether an object is in the proposed box and does not classify it into a specific class. The second stage then continues to fine-tune the proposed regions and classify objects in the proposed bounding boxes into particular labels. The RPN performs the first stage. This section inspects the details of the RPN and its related components in Faster R-CNN architecture, implemented in Detectron2, as in Figure 4.6.

Figure 4.6: The Region Proposal Network and its components

Continuing from the previous code example, the following code snippet displays the RPN (proposal_generator):

rpn = predictor.model.proposal_generator
type(rpn)

This snippet should print out the following:

detectron2.modeling.proposal_generator.rpn.RPN

The following...

Region of Interest Heads

The components of the Region of Interest Heads perform the second stage in the object detection architecture that Detectron2 implements. Figure 4.10 illustrates the steps inside this stage.

Figure 4.11: The Region of Interest Heads

Specifically, this stage takes the features extracted from the backbone network and the ground-truth bounding boxes (if training) and performs the following steps:

Label and sample proposals (if training).
Extract box features.
Perform predictions.
Calculate losses (if training).
Perform inferences (if inferencing).

If it is training, out of the 2,000 proposals (POST_NMS_TOPK_TRAIN), there can be many negative proposals compared to those positive ones (especially at the early stage of the training when the RPN is not accurate yet). Similar to the RPN stage, this step also labels (based on ground truth) and samples another mini-batch with a fraction of positive proposals...

Summary

This chapter dives deep into the components of Detectron2’s implementation of Faster R-CNN for object detection tasks. This model is a two-stage technique: region proposal stage and region of interest extraction stage. Both of these stages use the features extracted from a backbone network. This backbone network can be any state-of-the-art convolutional neural network to extract salient features from the input images. The extracted features and information to generate a set of initial anchors (sizes and ratios; Chapter 7 explains more about how to customize these sizes and ratios) are then passed to the region proposal neural network to predict a fixed number of proposals with objectness scores (if there is an object in a proposal) and location deltas (location differences between the predicted proposals and the raw anchors). The selected proposals are then passed to the second stage with the region of interest heads to predict the final object classification and localization...