You're reading from Hands-On Computer Vision with Detectron2

Product type Book

Published in Apr 2023

Publisher Packt

ISBN-13 9781800561625

Pages 318 pages

Edition 1st Edition

Languages

Python

Concepts

Computer Vision

Author (1):

Van Vung Pham

Table of Contents (20) Chapters

Preface

1. Part 1: Introduction to Detectron2

2. Chapter 1: An Introduction to Detectron2 and Computer Vision Tasks

3. Chapter 2: Developing Computer Vision Applications Using Existing Detectron2 Models

4. Part 2: Developing Custom Object Detection Models

5. Chapter 3: Data Preparation for Object Detection Applications

6. Chapter 4: The Architecture of the Object Detection Model in Detectron2

7. Chapter 5: Training Custom Object Detection Models

8. Chapter 6: Inspecting Training Results and Fine-Tuning Detectron2’s Solvers

9. Chapter 7: Fine-Tuning Object Detection Models

10. Chapter 8: Image Data Augmentation Techniques

11. Chapter 9: Applying Train-Time and Test-Time Image Augmentations

12. Part 3: Developing a Custom Detectron2 Model for Instance Segmentation Tasks

13. Chapter 10: Training Instance Segmentation Models

14. Chapter 11: Fine-Tuning Instance Segmentation Models

15. Part 4: Deploying Detectron2 Models into Production

16. Chapter 12: Deploying Detectron2 Models into Server Environments

17. Chapter 13: Deploying Detectron2 Models into Browsers and Mobile Environments

18. Index

Why subscribe?

19. Other Books You May Enjoy

Computer vision tasks

Deep learning achieves state-of-the-art results in many CV tasks. The most common CV task is image classification, in which a deep learning model gives a class label for a given image. However, recent advancements in deep learning allow computers to perform more advanced vision tasks. There are many of these advanced vision tasks.

However, this book focuses on more common and important ones, including object detection, instance segmentation, keypoint detection, semantic segmentation, and panoptic segmentation. It might be challenging for readers to differentiate between these tasks. Figure 1.1 depicts the differences between them. This section outlines what they are and when to use them, and the rest of the book focuses on how to implement these tasks using Detectron2. Let’s get started!

Figure 1.1: Common computer vision tasks

Object detection

Object detection generally includes object localization and classification. Specifically, deep learning models for this task predict where objects of interest are in an image by applying the bounding boxes around these objects (localization). Furthermore, these models also classify the detected objects into types of interest (classification).

One example of this task is specifying people in pictures and applying bounding boxes to the detected humans (localization only), as shown in Figure 1.1 (b). Another example is to detect road damage from a recorded road image by providing bounding boxes to the damage (localization) and further classifying the damage into types such as longitudinal cracks, traverse cracks, alligator cracks, and potholes (classification).

Instance segmentation

Like object detection, instance segmentation also involves object localization and classification. However, instance segmentation takes things one step further while localizing the detected objects of interest.

Specifically, besides classification, models for this task localize the detected objects at the pixel level. In other words, it identifies all the pixels of each detected object. Instance segmentation is needed in applications that require shapes of the detected objects in images and need to track every individual object. Figure 1.1 (c) shows the instance segmentation result on the input image in Figure 1.1 (a). Specifically, besides the bounding boxes, every pixel of each person is also highlighted.

Keypoint detection

Besides detecting objects, keypoint detection also indicates important parts of the detected objects called keypoints. These keypoints describe the detected object’s essential trait. This trait is often invariant to image rotation, shrinkage, translation, or distortion. For instance, the keypoints of humans include the eyes, nose, shoulders, elbows, hands, knees, and feet. Keypoint detection is important for applications such as action estimation, pose detection, or face detection. Figure 1.1 (d) shows the keypoint detection result on the input image in Figure 1.1 (a). Specifically, besides the bounding boxes, it highlights all keypoints for every detected individual.

Semantic segmentation

A semantic segmentation task does not detect specific instances of objects but classifies each pixel in an image into some classes of interest. For instance, a model for this task classifies regions of images into pedestrians, roads, cars, trees, buildings, and the sky in a self-driving car application. This task is important when providing a broader view of groups of objects with different classes (i.e., a higher level of understanding of the image). Specifically, if individual class instances are in one region, they are grouped into one mask instead of having a different mask for each individual.

One example of the application of semantic segmentation is to segment the images into foreground objects and background objects (e.g., to blur the background and provide a more artistic look for a portrait image). Figure 1.1 (e) shows the semantic segmentation result on the input image in Figure 1.1 (a). Specifically, the input picture is divided into regions classified as things (people or front objects) and background objects such as the sky, a mountain, dirt, grass, and a tree.

Panoptic segmentation

Panoptic literally means “everything visible in the image”. In other words, it can be viewed as combining common CV tasks such as instance segmentation and semantic segmentation. It helps to show the unified and global view of segmentation. Generally, it classifies objects in an image into foreground objects (that have proper geometries) and background objects (that do not have appropriate geometries but are textures or materials).

Examples of foreground objects include people, animals, and cars. Likewise, examples of background objects include the sky, dirt, trees, mountains, and grass. Different from semantic segmentation, panoptic segmentation does not group consecutive individual objects of the same class into one region. Figure 1.1 (f) shows the panoptic segmentation result on the input image in Figure 1.1 (a).

Specifically, it looks similar to the semantic segmentation result, except it highlights the individual instances separately.

Important note – other CV tasks

There are other advanced CV projects developed on top of Detectron2, such as DensePose and PointRend. However, this book focuses on developing CV applications for the more common ones, including object detection, instance segmentation, keypoint detection, semantic segmentation, and panoptic segmentation in Chapter 2. Furthermore, Part 2 and Part 3 of this book further explore developing custom CV applications for the two most important tasks (object detection and instance segmentation). There is also a section that describes how to use PointRend to improve instance segmentation quality. Additionally, it is relatively easy to expand the code for other tasks once you understand these tasks.

Let’s get started by getting to know Detectron2 and its architecture!