Reader small image

You're reading from  Hands-On Computer Vision with Detectron2

Product typeBook
Published inApr 2023
Reading LevelBeginner
PublisherPackt
ISBN-139781800561625
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Van Vung Pham
Van Vung Pham
author image
Van Vung Pham

Van Vung Pham is a passionate research scientist in machine learning, deep learning, data science, and data visualization. He has years of experience and numerous publications in these areas. He is currently working on projects that use deep learning to predict road damage from pictures or videos taken from roads. One of the projects uses Detectron2 and Faster R-CNN to predict and classify road damage and achieve state-of-the-art results for this task. Dr. Pham obtained his PhD from the Computer Science Department, at Texas Tech University, Lubbock, Texas, USA. He is currently an assistant professor at the Computer Science Department, Sam Houston State University, Huntsville, Texas, USA.
Read more about Van Vung Pham

Right arrow

Fine-Tuning Object Detection Models

Detectron2 utilizes the concepts of anchors to improve its object detection accuracy by allowing object detection models to predict from a set of anchors instead of from scratch. The set of anchors has various sizes and ratios to reflect the shapes of the objects to be detected. Detectron2 uses two sets of hyperparameters called sizes and ratios to generate the initial set of anchors. Therefore, this chapter explains how Detectron2 processes its inputs and provides code to analyze the ground-truth boxes from a training dataset and find appropriate values for these anchor sizes and ratios.

Additionally, input image pixels’ means and standard deviations are crucial in training Detectron2 models. Specifically, Detectron2 uses these values to normalize the input images during training. Calculating these hyperparameters over the whole dataset at once is often impossible for large datasets. Therefore, this chapter provides the code to calculate...

Technical requirements

You should have completed Chapter 1 to have an appropriate development environment for Detectron2. All the code, datasets, and results are available on the GitHub repo of the book at https://github.com/PacktPublishing/Hands-On-Computer-Vision-with-Detectron2. It is highly recommended to download the code and follow along.

Important note

This chapter has code that includes random number generators. Therefore, several values produced in this chapter may differ from run to run. However, the output values should be similar, and the main concepts remain the same.

Setting anchor sizes and anchor ratios

Detectron2 implements Faster R-CNN for object detection tasks, and Faster R-CNN makes excellent use of anchors to allow the object detection model to predict from a fixed set of image patches instead of detecting them from scratch. Anchors have different sizes and ratios to accommodate the fact that the detecting objects are of different shapes. In other words, having a set of anchors closer to the conditions of the to-be-detected things would improve the prediction performance and training time.

Therefore, the following sections cover the steps to (1) explore how Detectron2 prepares the image data for images, (2) get a sample of data for some pre-defined iterations and extract the ground-truth bounding boxes from the sampled data, and finally, (3) utilize clustering and genetic algorithms to find the best set of sizes and ratios for training.

Preprocessing input images

We need to know the sizes and ratios of the ground-truth boxes in...

Setting pixel means and standard deviations

Input image pixels’ means and standard deviations are crucial in training Detectron2 models. Specifically, Detectron2 uses these values to normalize the input images. Detectron2 has two configuration parameters for these. They are cfg.MODEL.PIXEL_MEAN and cfg.MODEL.PIXEL_STD. By default, the common values for these two hyperparameters generated from the ImageNet dataset are [103.53, 116.28, 123.675] and [57.375, 57.120, 58.395]. These values are appropriate for most of the color images. However, this specific case has grayscale images with different values for pixel means and standard deviations. Therefore, producing these two sets of values from the training dataset would be beneficial. This task has two main stages: (1) preparing a data loader to load images and (2) creating a class to calculate running means and standard deviations.

Preparing a data loader

Detectron2’s data loader is iterable and can yield infinite...

Putting it all together

The code for training the custom model with the ability to perform evaluations and a hook to save the best model remains the same as in sw. However, the configuration should be as follows:

# Codes to generate cfg object are removed for space effc.
# Solver
cfg.SOLVER.IMS_PER_BATCH = 6
cfg.SOLVER.BASE_LR = 0.001
cfg.SOLVER.WARMUP_ITERS = 1000
cfg.SOLVER.MOMENTUM = 0.9
cfg.SOLVER.STEPS = (3000, 4000)
cfg.SOLVER.GAMMA = 0.5
cfg.SOLVER.NESTROV = False
cfg.SOLVER.MAX_ITER = 5000
# checkpoint
cfg.SOLVER.CHECKPOINT_PERIOD = 500
# anchors
cfg.MODEL.ANCHOR_GENERATOR.SIZES = [[68.33245953, 112.91302277,  89.55701886, 144.71037342,  47.77637482]]
cfg.MODEL.ANCHOR_GENERATOR.ASPECT_RATIOS = [[0.99819939, 0.78726896, 1.23598428]]
# pixels
cfg.MODEL.PIXEL_MEAN = [20.1962, 20.1962, 20.1962]
cfg.MODEL.PIXEL_STD = [39.5985, 39.5985, 39.5985]
# Other params similar to prev. chapter are removed here

Please refer to the complete Jupyter notebook on GitHub...

Summary

This chapter provides code and visualizations to explain how Detectron2 preprocesses its inputs. In addition, it provides code to analyze the ground-truth bounding boxes and uses a genetic algorithm to select suitable values for the anchor settings (anchor sizes and ratios). Additionally, it explains the steps to produce the input pixels’ means and standard deviations from the training dataset in a running (per batch) manner when the training dataset is large and does not fit in memory at once. Finally, this chapter also puts the configurations derived in the previous chapter and this chapter into training. The results indicate that with a few modifications, the accuracy improves without impacting training or inferencing time. The next chapter utilizes these training configurations and the image augmentation techniques (introduced next) and fine-tunes the Detectron2 model for predicting brain tumors.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Computer Vision with Detectron2
Published in: Apr 2023Publisher: PacktISBN-13: 9781800561625
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Van Vung Pham

Van Vung Pham is a passionate research scientist in machine learning, deep learning, data science, and data visualization. He has years of experience and numerous publications in these areas. He is currently working on projects that use deep learning to predict road damage from pictures or videos taken from roads. One of the projects uses Detectron2 and Faster R-CNN to predict and classify road damage and achieve state-of-the-art results for this task. Dr. Pham obtained his PhD from the Computer Science Department, at Texas Tech University, Lubbock, Texas, USA. He is currently an assistant professor at the Computer Science Department, Sam Houston State University, Huntsville, Texas, USA.
Read more about Van Vung Pham