Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Hands-On Computer Vision with Detectron2

You're reading from  Hands-On Computer Vision with Detectron2

Product type Book
Published in Apr 2023
Publisher Packt
ISBN-13 9781800561625
Pages 318 pages
Edition 1st Edition
Languages
Author (1):
Van Vung Pham Van Vung Pham
Profile icon Van Vung Pham

Table of Contents (20) Chapters

Preface Part 1: Introduction to Detectron2
Chapter 1: An Introduction to Detectron2 and Computer Vision Tasks Chapter 2: Developing Computer Vision Applications Using Existing Detectron2 Models Part 2: Developing Custom Object Detection Models
Chapter 3: Data Preparation for Object Detection Applications Chapter 4: The Architecture of the Object Detection Model in Detectron2 Chapter 5: Training Custom Object Detection Models Chapter 6: Inspecting Training Results and Fine-Tuning Detectron2’s Solvers Chapter 7: Fine-Tuning Object Detection Models Chapter 8: Image Data Augmentation Techniques Chapter 9: Applying Train-Time and Test-Time Image Augmentations Part 3: Developing a Custom Detectron2 Model for Instance Segmentation Tasks
Chapter 10: Training Instance Segmentation Models Chapter 11: Fine-Tuning Instance Segmentation Models Part 4: Deploying Detectron2 Models into Production
Chapter 12: Deploying Detectron2 Models into Server Environments Chapter 13: Deploying Detectron2 Models into Browsers and Mobile Environments Index Other Books You May Enjoy

The Architecture of the Object Detection Model in Detectron2

This chapter dives deep into the architecture of Detectron2 for the object detection task. The object detection model in Detectron2 is the implementation of Faster R-CNN. Specifically, this architecture includes the backbone network, the region proposal network, and the region of interest heads. This chapter is essential for understanding common terminology when designing deep neural networks for vision systems. Deep understanding helps to fine-tune and customize models for better accuracy while training with the custom datasets.

By the end of this chapter, you will understand Detectron2’s typical architecture in detail. You also know where to customize your Detectron2 model (what configuration parameters to set, how to set them, and where to add/remove layers) to improve performance. Specifically, this chapter covers the following topics:

  • Introduction to the application architecture
  • The backbone network...

Technical requirements

You should have set up the development environment with the instructions provided in Chapter 1. Thus, if you still need to do so, please complete setting up the development environment before continuing. Additionally, you should read Chapter 2 to understand Detectron2’s Model Zoo and what backbone networks are available. All the code, datasets, and results are available on the GitHub page of the book (in the folder named Chapter04) at https://github.com/PacktPublishing/Hands-On-Computer-Vision-with-Detectron2. It is highly recommended that you download the code and follow along.

Introduction to the application architecture

As discussed in Chapter 1 and shown in Figure 4.1, Detectron2 has the architecture with the backbone network, the region proposal network, and the region of interest heads.

Figure 4.1: The main components of Detectron2

Figure 4.1: The main components of Detectron2

The backbone network includes several convolutional layers that help to perform feature extraction from the input image. The region proposal network is another neural network that predicts the proposals with objectness and locations of the objects before feeding to the next stage. The region of interest heads have neural networks for object localization and classification. However, the implementation details of Detectron2 are more involved. We should understand this architecture in depth to know what Detectron2 configurations to set and how to fine-tune its model.

Figure 4.2: Architecture of Detectron2’s implementation of Faster R-CNN

Figure 4.2: Architecture of Detectron2’s implementation of Faster R-CNN

Detectron2’s architecture...

The backbone network

Chapter 2 discusses the typical backbone networks for Detectron2. They include ResNet50, ResNet101, ResNeXt101, and their variants. This section inspects the ResNet50 architecture as an example. However, the idea remains the same for other base models (backbone networks). Figure 4.3 summarizes the steps to inspect the backbone network. Specifically, we pass a tensor of data for a single image to the backbone, and the backbone (ResNet50, in this case) gives out a tensor. This output tensor is the extracted salient feature of the input image.

Figure 4.3: The backbone network

Figure 4.3: The backbone network

Specifically, from the default Detectron2’s predictor, we can access the backbone network using the following code snippet:

backbone = predictor.model.backbone
type(backbone)

This code snippet should print out the following:

detectron2.modeling.backbone.resnet.ResNet

The following code snippet reveals the backbone’s architecture:

print...

Region Proposal Network

Faster R-CNN is called a two-stage technique. The first stage proposes the regions (bounding boxes) and whether an object falls within that region (objectness). Notably, at this stage, it only predicts whether an object is in the proposed box and does not classify it into a specific class. The second stage then continues to fine-tune the proposed regions and classify objects in the proposed bounding boxes into particular labels. The RPN performs the first stage. This section inspects the details of the RPN and its related components in Faster R-CNN architecture, implemented in Detectron2, as in Figure 4.6.

Figure 4.6: The Region Proposal Network and its components

Figure 4.6: The Region Proposal Network and its components

Continuing from the previous code example, the following code snippet displays the RPN (proposal_generator):

rpn = predictor.model.proposal_generator
type(rpn)

This snippet should print out the following:

detectron2.modeling.proposal_generator.rpn.RPN

The following...

Region of Interest Heads

The components of the Region of Interest Heads perform the second stage in the object detection architecture that Detectron2 implements. Figure 4.10 illustrates the steps inside this stage.

Figure 4.11: The Region of Interest Heads

Figure 4.11: The Region of Interest Heads

Specifically, this stage takes the features extracted from the backbone network and the ground-truth bounding boxes (if training) and performs the following steps:

  1. Label and sample proposals (if training).
  2. Extract box features.
  3. Perform predictions.
  4. Calculate losses (if training).
  5. Perform inferences (if inferencing).

If it is training, out of the 2,000 proposals (POST_NMS_TOPK_TRAIN), there can be many negative proposals compared to those positive ones (especially at the early stage of the training when the RPN is not accurate yet). Similar to the RPN stage, this step also labels (based on ground truth) and samples another mini-batch with a fraction of positive proposals...

Summary

This chapter dives deep into the components of Detectron2’s implementation of Faster R-CNN for object detection tasks. This model is a two-stage technique: region proposal stage and region of interest extraction stage. Both of these stages use the features extracted from a backbone network. This backbone network can be any state-of-the-art convolutional neural network to extract salient features from the input images. The extracted features and information to generate a set of initial anchors (sizes and ratios; Chapter 7 explains more about how to customize these sizes and ratios) are then passed to the region proposal neural network to predict a fixed number of proposals with objectness scores (if there is an object in a proposal) and location deltas (location differences between the predicted proposals and the raw anchors). The selected proposals are then passed to the second stage with the region of interest heads to predict the final object classification and localization...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Hands-On Computer Vision with Detectron2
Published in: Apr 2023 Publisher: Packt ISBN-13: 9781800561625
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}