Reader small image

You're reading from  Hands-On Computer Vision with Detectron2

Product typeBook
Published inApr 2023
Reading LevelBeginner
PublisherPackt
ISBN-139781800561625
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Van Vung Pham
Van Vung Pham
author image
Van Vung Pham

Van Vung Pham is a passionate research scientist in machine learning, deep learning, data science, and data visualization. He has years of experience and numerous publications in these areas. He is currently working on projects that use deep learning to predict road damage from pictures or videos taken from roads. One of the projects uses Detectron2 and Faster R-CNN to predict and classify road damage and achieve state-of-the-art results for this task. Dr. Pham obtained his PhD from the Computer Science Department, at Texas Tech University, Lubbock, Texas, USA. He is currently an assistant professor at the Computer Science Department, Sam Houston State University, Huntsville, Texas, USA.
Read more about Van Vung Pham

Right arrow

Data Preparation for Object Detection Applications

This chapter discusses the steps to prepare data for training models using Detectron2. Specifically, it provides tools to label images if you have some datasets at hand. Otherwise, it points you to places with open datasets so that you can quickly download and build custom applications for computer vision tasks. Additionally, this chapter covers the techniques to convert standard annotation formats to the data format required by Detectron2 if the existing datasets come in different formats.

By the end of this chapter, you will know how to label data for object detection tasks and how to download existing datasets and convert data of different formats to the format supported by Detectron2. Specifically, this chapter covers the following:

  • Common data sources
  • Getting images
  • Selecting image labeling tools
  • Annotation formats
  • Labeling the images
  • Annotation format conversions

Technical requirements

You should have set up the development environment with the instructions provided in Chapter 1. Thus, if you have not done so, please complete setting up the development environment before continuing. All the code, datasets, and results are available on the GitHub page of the book (under the folder named Chapter03) at https://github.com/PacktPublishing/Hands-On-Computer-Vision-with-Detectron2. It is highly recommended to download the code and follow along.

Common data sources

Chapter 2 introduced the two most common datasets for the computer vision community. They are ImageNet and Microsoft COCO (Common Objects in Context). These datasets also contain many pre-trained models that can predict various class labels that may meet your everyday needs.

If your task is to detect a less common class label, it might be worth exploring the Large Vocabulary Instance Segmentation (LVIS) dataset. It has more than 1,200 categories and 164,000 images, and it contains many rare categories and about 2 million high-quality instance segmentation masks. Detectron2 also provides pre-trained models for predicting these 1,200+ labels. Thus, you can follow the steps described in Chapter 2 to create a computer vision application that meets your needs. More information about the LVIS dataset is available on its website at https://www.lvisdataset.org.

If you have a task where no existing/pre-trained models can meet your needs, it is time to find existing...

Getting images

When you start a new project, you might have a set of images collected using graphic devices such as cameras, smartphones, drones, or other specialized devices. If you have some photos, you can skip this step and move to the following steps, where you select a labeling tool and start labeling. However, if you are about to start a fresh new project and do not even have any images, you can get some ideas from this section on collecting pictures from the internet. There might be copyrights related to images that you collect from the internet. Thus, you should check for copyrights to see whether the downloaded images are usable for your purposes.

Google Images contains a massive number of photos, which continues to grow every day. Therefore, it is a great resource to crawl images for training your custom model. Python has a simple_image_download package that provides scripts to search for images using keywords or tags and download them. First, we will install this package...

Selecting an image labeling tool

Computer vision applications are developing rapidly. Therefore, there are many tools for labeling images for computer vision applications. These tools range from free and open source to fully commercial or commercial with free trials (which means there are some limitations regarding available features). The labeling tools may be desktop applications (require installations), online web applications, or locally hosted web applications. The online applications may even provide cloud storage, utilities to support collaborative team labeling, and pre-trained models that help to label quicker (generally, with some cost).

One of the popular image labeling tools is labelImg. It is available at https://github.com/heartexlabs/labelImg/blob/master/README.rst. It is an open source, lightweight, fast, easy-to-use, Python-based application with a short learning curve.

Its limitation is that it supports only the creation of rectangle bounding boxes. It is currently...

Annotation formats

Similar to labeling tools, many different annotation formats are available for annotating images for computer vision applications. The common standards include COCO JSON, Pascal VOC XML, and YOLO PyTorch TXT. There are many more formats (e.g., TensorFlow TFRecord, CreateML JSON, and so on). However, this section covers only the previously listed three most common annotation standards due to space limitations. Furthermore, this section uses two images and labels extracted from the test set of the brain tumor object detection dataset available from Kaggle (https://www.kaggle.com/datasets/davidbroberts/brain-tumor-object-detection-datasets) to illustrate these data formats and demonstrate their differences, as shown in Figure 3.4. This section briefly discusses the key points of each annotation format, and interested readers can refer to the GitHub page of this chapter to inspect this same dataset in different formats in further detail.

Figure 3.4: Two images and tumor labels used to illustrate different annotation formats

Figure...

Labeling the images

This chapter uses labelImg to perform data labeling for object detection tasks. This tool requires installation on the local computer. Therefore, if you downloaded your images using Google Colab to your Google Drive, you need to map or download these images to a local computer to perform labeling. Run the following snippet in a terminal on a local computer to install labelImg using Python 3 (if you are running Python 2, please refer to the labelImg website for a guide):

pip3 install labelImg

After installing, run the following snippet to start the tool, where [IMAGE_PATH] is an optional argument to specify the path to the image folder, and [PRE-DEFINED CLASS FILE] is another optional argument to indicate the path to a text file (*.txt) that defines a list of class labels (one per line):

labelImg [IMAGE_PATH] [PRE-DEFINED CLASS FILE]

For instance, after downloading/synchronizing the simple_images folder for the downloaded images in the previous steps...

Annotation format conversions

Detectron2 has its data description built based on the COCO annotation format. In other words, it supports registering datasets using the COCO data annotation format. However, other data annotation formats are abundant, and you may download a dataset or use a labeling tool that supports another data format different from COCO. Therefore, this section covers the code snippets used to convert data from the popular Pascal VOC and YOLO formats to COCO style.

Important note

A statement that starts with an exclamation mark (!) means it is a Bash command to be executed in a Jupyter notebook (Google Colab) code cell. If you want to run it in a terminal, you can safely remove this exclamation mark and execute this statement.

By understanding the different data formats as described, it is relatively easy to write code to convert data from one format to another. However, this section uses the pylabel package to perform this conversion to speed up the development...

Summary

This chapter discussed the popular data sources for the computer vision community. These data sources often have pre-trained models that help you quickly build computer vision applications. We also learned about the common places to download computer vision datasets. If no datasets exist for a specific computer vision task, this chapter also helped you get images by downloading them from the internet and select a tool for labeling the downloaded images. Furthermore, the computer vision field is developing rapidly, and many different annotation formats are available. Therefore, this chapter also covered popular data formats and the steps to convert these formats into the format supported by Detectron2.

By this time, you should have your dataset ready. The next chapter discusses the architecture of Detectron2 with details regarding the backbone networks and how to select one for an object detection task before training an object detection model using Detectron2.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Computer Vision with Detectron2
Published in: Apr 2023Publisher: PacktISBN-13: 9781800561625
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Van Vung Pham

Van Vung Pham is a passionate research scientist in machine learning, deep learning, data science, and data visualization. He has years of experience and numerous publications in these areas. He is currently working on projects that use deep learning to predict road damage from pictures or videos taken from roads. One of the projects uses Detectron2 and Faster R-CNN to predict and classify road damage and achieve state-of-the-art results for this task. Dr. Pham obtained his PhD from the Computer Science Department, at Texas Tech University, Lubbock, Texas, USA. He is currently an assistant professor at the Computer Science Department, Sam Houston State University, Huntsville, Texas, USA.
Read more about Van Vung Pham