Hands-On Computer Vision with Detectron2

An Introduction to Detectron2 and Computer Vision Tasks

This chapter introduces Detectron2, its architectures, and the computer vision (CV) tasks that Detectron2 can perform. In other words, this chapter discusses what CV tasks Detectron2 can perform and why we need them. Additionally, this chapter provides the steps to set up environments for developing CV applications using Detectron2 locally or on the cloud using Google Colab.

By the end of this chapter, you will understand the main CV tasks (e.g, object detection, instance segmentation, keypoint detection, semantic segmentation, and panoptic segmentation); know how Detectron2 works and what it can do to help you tackle CV tasks using deep learning; and be able to set up local and cloud environments for developing Detectron2 applications.

Specifically, this chapter covers the following topics:

Computer vision tasks
Introduction to Detectron2 and its architecture
Detectron2 development environments

Computer vision tasks

Deep learning achieves state-of-the-art results in many CV tasks. The most common CV task is image classification, in which a deep learning model gives a class label for a given image. However, recent advancements in deep learning allow computers to perform more advanced vision tasks. There are many of these advanced vision tasks.

However, this book focuses on more common and important ones, including object detection, instance segmentation, keypoint detection, semantic segmentation, and panoptic segmentation. It might be challenging for readers to differentiate between these tasks. Figure 1.1 depicts the differences between them. This section outlines what they are and when to use them, and the rest of the book focuses on how to implement these tasks using Detectron2. Let’s get started!

Figure 1.1: Common computer vision tasks

Object detection

Object detection generally includes object localization and classification. Specifically, deep learning models for this task predict where objects of interest are in an image by applying the bounding boxes around these objects (localization). Furthermore, these models also classify the detected objects into types of interest (classification).

One example of this task is specifying people in pictures and applying bounding boxes to the detected humans (localization only), as shown in Figure 1.1 (b). Another example is to detect road damage from a recorded road image by providing bounding boxes to the damage (localization) and further classifying the damage into types such as longitudinal cracks, traverse cracks, alligator cracks, and potholes (classification).

Instance segmentation

Like object detection, instance segmentation also involves object localization and classification. However, instance segmentation takes things one step further while localizing the detected objects of interest.

Specifically, besides classification, models for this task localize the detected objects at the pixel level. In other words, it identifies all the pixels of each detected object. Instance segmentation is needed in applications that require shapes of the detected objects in images and need to track every individual object. Figure 1.1 (c) shows the instance segmentation result on the input image in Figure 1.1 (a). Specifically, besides the bounding boxes, every pixel of each person is also highlighted.

Keypoint detection

Besides detecting objects, keypoint detection also indicates important parts of the detected objects called keypoints. These keypoints describe the detected object’s essential trait. This trait is often invariant to image rotation, shrinkage, translation, or distortion. For instance, the keypoints of humans include the eyes, nose, shoulders, elbows, hands, knees, and feet. Keypoint detection is important for applications such as action estimation, pose detection, or face detection. Figure 1.1 (d) shows the keypoint detection result on the input image in Figure 1.1 (a). Specifically, besides the bounding boxes, it highlights all keypoints for every detected individual.

Semantic segmentation

A semantic segmentation task does not detect specific instances of objects but classifies each pixel in an image into some classes of interest. For instance, a model for this task classifies regions of images into pedestrians, roads, cars, trees, buildings, and the sky in a self-driving car application. This task is important when providing a broader view of groups of objects with different classes (i.e., a higher level of understanding of the image). Specifically, if individual class instances are in one region, they are grouped into one mask instead of having a different mask for each individual.

One example of the application of semantic segmentation is to segment the images into foreground objects and background objects (e.g., to blur the background and provide a more artistic look for a portrait image). Figure 1.1 (e) shows the semantic segmentation result on the input image in Figure 1.1 (a). Specifically, the input picture is divided into regions classified as things (people or front objects) and background objects such as the sky, a mountain, dirt, grass, and a tree.

Panoptic segmentation

Panoptic literally means “everything visible in the image”. In other words, it can be viewed as combining common CV tasks such as instance segmentation and semantic segmentation. It helps to show the unified and global view of segmentation. Generally, it classifies objects in an image into foreground objects (that have proper geometries) and background objects (that do not have appropriate geometries but are textures or materials).

Examples of foreground objects include people, animals, and cars. Likewise, examples of background objects include the sky, dirt, trees, mountains, and grass. Different from semantic segmentation, panoptic segmentation does not group consecutive individual objects of the same class into one region. Figure 1.1 (f) shows the panoptic segmentation result on the input image in Figure 1.1 (a).

Specifically, it looks similar to the semantic segmentation result, except it highlights the individual instances separately.

Important note – other CV tasks

There are other advanced CV projects developed on top of Detectron2, such as DensePose and PointRend. However, this book focuses on developing CV applications for the more common ones, including object detection, instance segmentation, keypoint detection, semantic segmentation, and panoptic segmentation in Chapter 2. Furthermore, Part 2 and Part 3 of this book further explore developing custom CV applications for the two most important tasks (object detection and instance segmentation). There is also a section that describes how to use PointRend to improve instance segmentation quality. Additionally, it is relatively easy to expand the code for other tasks once you understand these tasks.

Let’s get started by getting to know Detectron2 and its architecture!

An introduction to Detectron2 and its architecture

Detectron2 is Facebook (now Meta) AI Research’s open source project. It is a next-generation library that provides cutting-edge detection and segmentation algorithms. Many research and practical projects at Facebook use it as a library to support implementing CV tasks. The following sections introduce Detectron2 and provide an overview of its architecture.

Introducing Detectron2

Detectron2 implements state-of-the-art detection algorithms, such as Mask R-CNN, RetinaNet, Faster R-CNN, RPN, TensorMask, PointRend, DensePose, and more. The question that immediately comes to mind after this statement is, why is it better if it re-implements existing cutting-edge algorithms? The answer is that Detectron2 has the advantages of being faster, more accurate, modular, customizable, and built on top of PyTorch.

Specifically, it is faster and more accurate because while reimplementing the cutting-edge algorithms, there is the chance that Detectron2 will find suboptimal implementation parts or obsolete features from older versions of these algorithms and re-implement them. It is modular, or it divides its implementation into sub-parts. The parts include the input data, backbone network, region proposal heads, and prediction heads (the next section covers more information about these components). It is customizable, meaning its components have built-in implementations, but they can be customized by calling new implementations. Finally, it is built on top of PyTorch, meaning that many developer resources are available online to help develop applications with Detectron2.

Furthermore, Detectron2 provides pre-trained models with state-of-the-art detection results for CV tasks. These models were trained with many images on high computation resources at the Facebook research lab that might not be available in other institutions.

These pre-trained models are published on its Model Zoo and are free to use: https://github.com/facebookresearch/detectron2/blob/main/MODEL_ZOO.md.

These pre-trained models help developers develop typical CV applications quickly without collecting, preparing many images, or requiring high computation resources to train new models. However, suppose there is a need for developing a CV task on a specific domain with a custom dataset. In that case, these existing models can be the starting weights, and the whole Detectron2 model can be trained again on the custom dataset.

Finally, we can convert Detectron2 models into deployable artifacts. Precisely, we can convert Detectron2 models into standard file formats of standard deep learning frameworks such as TorchScript, Caffe2 protobuf, and ONNX. These files can then be deployed to their corresponding runtimes, such as PyTorch, Caffe2, and ONNX Runtime. Furthermore, Facebook AI Research also published Detectron2Go (D2Go), a platform where developers can take their Detectron2 development one step further and create models optimized for mobile devices.

In summary, Detectron2 implements cutting-edge detection algorithms with the advantage of being fast, accurate, modular, and built on top of PyTorch. Detectron2 also provides pre-trained models so users can get started and quickly build CV applications with state-of-the-art results. It is also customizable, so users can change its components or train CV applications on a custom business domain. Furthermore, we can export Detectron2 into scripts supported by standard deep learning framework runtimes. Additionally, initial research called Detectron2Go supports developing Detectron2 applications for edge devices.

In the next section, we will look into Detectron2 architecture to understand how it works and the possibilities of customizing each of its components.

Detectron2 architecture

Figure 1.2: The main components of Detectron2

Detectron2 has a modular architecture. Figure 1.2 depicts the four main modules in a standard Detectron2 application. The first module is for registering input data (Input Data).

The second module is the backbone to extract image features (Backbone), followed by the third one for proposing regions with and without objects to be fed to the next training stage (Region Proposal). Finally, the last module uses appropriate heads (such as detection heads, instance segmentation heads, keypoint heads, semantic segmentation heads, or panoptic heads) to predict the regions with objects and classify detected objects into classes. Chapter 3 to Chapter 5 discuss these components for building a CV application for object detection tasks, and Chapter 10 and Chapter 11 detail these components for segmentation tasks. The following sections briefly discuss these components in general.

The input data module

The input data module is designed to load data in large batches from hard drives with optimization techniques such as caching and multi-workers. Furthermore, it is relatively easy to plug data augmentation techniques into a data loader for this module. Additionally, it is designed to be customizable so that users can register their custom datasets. The following is the typical syntax for assigning a custom dataset to train a Detectron2 model using this module:

DatasetRegistry.register(
    'my_dataset',
    load_my_dataset
)

The backbone module

The backbone module extracts features from the input images. Therefore, this module often uses a cutting-edge convolutional neural network such as ResNet or ResNeXt. This module can be customized to call any standard convolutional neural network that performs well in an image classification task of interest. Notably, this module has a great deal of knowledge about transfer learning. Specifically, we can use those pre-trained models here if we want to use a state-of-the-art convolution neural network that works well with large image datasets such as ImageNet. Otherwise, we can choose those simple networks for this module to increase performance (training and prediction time) with the accuracy trade-off. Chapter 2 will discuss selecting appropriate pre-trained models on the Detectron2 Model Zoo for common CV tasks.

The following code snippet shows the typical syntax for registering a custom backbone network to train the Detectron2 model using this module:

@BACKBONE_REGISTRY.register()
class CustomBackbone(Backbone):
    pass

The region proposal module

The next module is the region proposal module (Region Proposal). This module accepts the extracted features from the backbone and predicts or proposes image regions (with location specifications) and scores to indicate whether the regions contain objects (with objectness scores). The objectness score of a proposed region may be 0 (for not having an object or being background) or 1 (for being sure that there is an object of interest in the predicted region). Notably, this object score is not about the probability of being a class of interest but simply whether the region contains an object (of any class) or not (background).

This module is set with a default Region Proposal Network (RPN). However, replacing this network with a custom one is relatively easy. The following is the typical syntax for registering a custom RPN to train the Detectron2 model using this module:

@ROI_BOX_HEAD_REGISTRY.register()
class CustomBoxHead(nn.Module):
    pass

Region of interest module

The last module is the place for the region of interest (RoI) heads. Depending on the CV tasks, we can select appropriate heads for this module, such as detection heads, segmentation heads, keypoint heads, or semantic segmentation heads. For instance, the detection heads accept the region proposals and the input features of the proposed regions and pass them through a fully connected network, with two separate heads for prediction and classification. Specifically, one head is used to predict bounding boxes for objects, and another is for classifying the detected bounding boxes into corresponding classes.

On the other hand, semantic segmentation heads also use convolutional neural network heads to classify each pixel into one of the classes of interest. The following is the typical syntax for registering custom region of interest heads to train the Detectron2 model using this module:

@ROI_HEAD_REGISTRY.register()
class CustomHeads(StandardROIHeads):
    pass

Now that you have an understanding of Detectron2 and its architecture, let's prepare development environments for developing Detectron2 applications.

Detectron2 development environments

Now, we understand the advanced CV tasks and how Detectron2 helps to develop applications for these tasks. It is time to start developing Detectron2 applications. This section provides steps to set up Detectron2 development environments on the cloud using Google Colab, a local environment, or a hybrid approach connecting Google Colab to a locally hosted runtime.

Cloud development environment for Detectron2 applications

Google Colab or Colaboratory (https://colab.research.google.com) is a cloud platform that allows you to write and execute Python code from your web browser. It enables users to start developing deep learning applications with zero configuration because most common machine learning and deep learning packages, such as PyTorch and TensorFlow, are pre-installed. Furthermore, users will have access to GPUs free of charge. Even with the free plan, users have access to a computation resource that is relatively better than a standard personal computer. Users can pay a small amount for Pro or Pro+ with higher computation resources if needed. Additionally, as its name indicates, it is relatively easy to collaborate on Google Colab, and it is easy to share Google Colab files and projects.

Deep learning models for CV tasks work with many images; thus, GPUs significantly speed up the training and inferencing time. However, by default, Google Colab does not enable GPUs' runtime. Therefore, users should enable the GPU hardware accelerator before installing Detectron2 or training Detectron2 applications. This step is to select GPU from the Hardware accelerator drop-down menu found under Runtime | Change runtime type, as shown in Figure 1.3:

Figure 1.3: Select GPU for Hardware accelerator

Detectron2 has a dedicated tutorial on how to install Detectron2 on Google Colab. However, this section discusses each step and gives further details about these. First, Detectron2 is built on top of PyTorch, so we need to have PyTorch installed. By default, Google Colab runtime already installs PyTorch. So, you can use the following snippet to install Detectron2 on Google Colab:

!python -m pip install \
'git+https://github.com/facebookresearch/detectron2.git'

If you have an error message such as the following one, it is safe to ignore it and proceed:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
flask 1.1.4 requires click<8.0,>=5.1, but you have click 8.1.3 which is incompatible.

However, if you face problems such as PyTorch versions on Google Colab, they may not be compatible with Detectron2. Then, you can install Detectron2 for specific versions of PyTorch and CUDA. You can use the following snippet to get PyTorch and CUDA versions:

import torch
TORCH_VERSION = ".".join(torch.__version__.split(".")[:2])
CUDA_VERSION = torch.__version__.split("+")[-1]
print("torch: ", TORCH_VERSION, "; cuda: ", CUDA_VERSION)

After understanding the PyTorch and CUDA versions, you can use the following snippet to install Detectron2. Please remember to replace TORCH_VERSION and CUDA_VERSION with the values found in the previous snippet:

!python -m pip install detectron2 -f \
https://dl.fbaipublicfiles.com/detectron2/wheels/{TORCH_VERSION}/{CUDA_VERSION}/index.html

Here is an example of such an installation command for CUDA version 11.3 and PyTorch version 1.10:

!python -m pip install detectron2 -f \
https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html

If you face an error such as the following, it means that there is no matching Detectron2 distribution for the current versions of PyTorch and CUDA:

ERROR: Could not find a version that satisfies the requirement detectron2 (from versions: none)
ERROR: No matching distribution found for detectron2

In this case, you can visit the Detectron2 installation page to find the distributions compatible with the current PyTorch and CUDA versions. This page is available at https://detectron2.readthedocs.io/en/latest/tutorials/install.html.

Figure 1.4 shows the current Detectron2 distributions with corresponding CUDA/CPU and PyTorch versions:

Figure 1.4: Current Detectron2 distributions for corresponding CUDA/CPU and PyTorch versions

Suppose Detectron2 does not have a distribution that matches your current CUDA and PyTorch versions. Then, there are two options. The first option is to select the Detectron2 version with CUDA and PyTorch versions that are closest to the ones that you have. This approach should generally work. Otherwise, you can install the CUDA and PyTorch versions that Detectron2 supports.

Finally, you can use the following snippet to check the installed Detectron2 version:

import detectron2
print(detectron2.__version__)

Congratulations! You are now ready to develop CV applications using Detectron2 on Google Colab. Read on if you want to create Detectron2 applications on a local machine. Otherwise, you can go to Chapter 2 to start developing Detectron2 CV applications.

Local development environment for Detectron2 applications

Google Colab is an excellent cloud environment to quickly start building deep learning applications. However, it has several limitations. For instance, the free Google Colab plan may not have enough RAM and GPU resources for large projects. Another limitation is that your runtime may terminate if your kernel is idle for a while. Even in the purchased Pro+ plan, a Google Colab kernel can only run for 24 hours, after which it is terminated. That said, if you have a computer with GPUs, it is better to install Detectron2 on this local computer for development.

Important note – resume training option

Due to time limitations, Google Colab may terminate your runtime before your training completes. Therefore, you should train your models with a resumable option so that the Detectron2 training process can pick up the stored weights from its previous training run. Fortunately, Detectron2 supports a resumable training option so that you can do this easily.

At the time of writing this book, Detectron2 supports Linux and does not officially support Windows. You may refer to its installation page for some workarounds at https://detectron2.readthedocs.io/en/latest/tutorials/install.html if you want to install Detectron2 on Windows. This section covers the steps to install Detectron2 on Linux. Detectron2 is built on top of PyTorch. Therefore, the main installation requirement (besides Python itself) is PyTorch. Please refer to PyTorch’s official page at https://pytorch.org/ to perform the installation. Figure 1.5 shows the interface to select appropriate configurations for your current system and generate a PyTorch installation command at the bottom.

Figure 1.5: PyTorch installation command generator (https://pytorch.org)

The next installation requirement is to install Git to install Detectron2 from source. Git is also a tool that any software developer should have. Especially since we are developing relatively complex CV applications, this tool is valuable. You can use the following steps to install and check the installed Git version from the Terminal:

$ sudo apt-get update
$ sudo apt-get install git
$ git --version

Once PyTorch and Git are installed, the steps to install Detectron2 on a local computer are the same as those used to install Detectron2 on Google Colab, described in the previous section.

Connecting Google Colab to a local development environment

There are cases where developers have developed some code with Google Colab, or they may want to use files stored on Google Drive or prefer to code with the Google Colab interface more than the standard Jupyter notebook on a local computer. In these cases, Google Colab provides an option to execute its notebook in a local environment (or other hosted runtimes such as Google Cloud instances). Google Colab has instructions for this available here: https://research.google.com/colaboratory/local-runtimes.html.

Important note – browser-specific settings

The following steps are for Google Chrome. If you are using Firefox, you must perform custom settings to allow connections from HTTPS domains with standard WebSockets. The instructions are available here: https://research.google.com/colaboratory/local-runtimes.html.

We will first need to install Jupyter on the local computer. The next step is to enable the jupyter_http_over_ws Jupyter extension using the following snippet:

$ pip install jupyter_http_over_ws
$ jupyter serverextension enable --py jupyter_http_over_ws

The next step is to start the Jupyter server on the local machine with an option to trust the WebSocket connections so that the Google Colab notebook can connect to the local runtime, using the following snippet:

$ jupyter notebook \
--NotebookApp.allow_origin=\
'https://colab.research.google.com' \
--port=8888 \
--NotebookApp.port_retries=0

Once the local Jupyter server is running, in the Terminal, there is a backend URL with an authentication token that can be used to access this local runtime from Google Colab. Figure 1.6 shows the steps to connect the Google Colab notebook to a local runti me: Connect | Connect to a local runtime:

Figure 1.6: Connecting the Google Colab notebook to a local runtime

On the next dialog, enter the backend URL generated in the local Jupyter server and click the Connect button. Congratulations! You can now use the Google Colab notebook to code Python applications using a local kernel.

Hands-On Computer Vision with Detectron2: Develop object detection and segmentation models with a code and visualization approach

What do you get with a Packt Subscription?

Product Details

Hands-On Computer Vision with Detectron2

An Introduction to Detectron2 and Computer Vision Tasks

Technical requirements

Computer vision tasks

Object detection

Instance segmentation

Keypoint detection

Semantic segmentation

Panoptic segmentation

An introduction to Detectron2 and its architecture

Introducing Detectron2

Detectron2 architecture

The input data module

The backbone module

The region proposal module

Region of interest module

Detectron2 development environments

Cloud development environment for Detectron2 applications

Local development environment for Detectron2 applications

Connecting Google Colab to a local development environment

Summary

Page 1 of 6

Key benefits

Description

What you will learn

What do you get with a Packt Subscription?

Product Details

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

Authors (1)

FAQs

Hands-On Computer Vision with Detectron2: Develop object detection and segmentation models with a code and visualization approach

What do you get with a Packt Subscription?

Product Details

Key benefits

Description

What you will learn

What do you get with a Packt Subscription?

Product Details

Packt Subscriptions

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

Authors (1)

FAQs