You're reading from Hands-On Computer Vision with Detectron2

Product typeBook

Published inApr 2023

Reading LevelBeginner

PublisherPackt

ISBN-139781800561625

Edition1st Edition

Languages

Python

Tools

PyTorch

Concepts

Computer Vision

Author (1)

Van Vung Pham

Data Preparation for Object Detection Applications

This chapter discusses the steps to prepare data for training models using Detectron2. Specifically, it provides tools to label images if you have some datasets at hand. Otherwise, it points you to places with open datasets so that you can quickly download and build custom applications for computer vision tasks. Additionally, this chapter covers the techniques to convert standard annotation formats to the data format required by Detectron2 if the existing datasets come in different formats.

By the end of this chapter, you will know how to label data for object detection tasks and how to download existing datasets and convert data of different formats to the format supported by Detectron2. Specifically, this chapter covers the following:

Common data sources
Getting images
Selecting image labeling tools
Annotation formats
Labeling the images
Annotation format conversions

Technical requirements

You should have set up the development environment with the instructions provided in Chapter 1. Thus, if you have not done so, please complete setting up the development environment before continuing. All the code, datasets, and results are available on the GitHub page of the book (under the folder named Chapter03) at https://github.com/PacktPublishing/Hands-On-Computer-Vision-with-Detectron2. It is highly recommended to download the code and follow along.

Common data sources

Chapter 2 introduced the two most common datasets for the computer vision community. They are ImageNet and Microsoft COCO (Common Objects in Context). These datasets also contain many pre-trained models that can predict various class labels that may meet your everyday needs.

If your task is to detect a less common class label, it might be worth exploring the Large Vocabulary Instance Segmentation (LVIS) dataset. It has more than 1,200 categories and 164,000 images, and it contains many rare categories and about 2 million high-quality instance segmentation masks. Detectron2 also provides pre-trained models for predicting these 1,200+ labels. Thus, you can follow the steps described in Chapter 2 to create a computer vision application that meets your needs. More information about the LVIS dataset is available on its website at https://www.lvisdataset.org.

If you have a task where no existing/pre-trained models can meet your needs, it is time to find existing...

Getting images

When you start a new project, you might have a set of images collected using graphic devices such as cameras, smartphones, drones, or other specialized devices. If you have some photos, you can skip this step and move to the following steps, where you select a labeling tool and start labeling. However, if you are about to start a fresh new project and do not even have any images, you can get some ideas from this section on collecting pictures from the internet. There might be copyrights related to images that you collect from the internet. Thus, you should check for copyrights to see whether the downloaded images are usable for your purposes.

Google Images contains a massive number of photos, which continues to grow every day. Therefore, it is a great resource to crawl images for training your custom model. Python has a simple_image_download package that provides scripts to search for images using keywords or tags and download them. First, we will install this package...

Selecting an image labeling tool

Computer vision applications are developing rapidly. Therefore, there are many tools for labeling images for computer vision applications. These tools range from free and open source to fully commercial or commercial with free trials (which means there are some limitations regarding available features). The labeling tools may be desktop applications (require installations), online web applications, or locally hosted web applications. The online applications may even provide cloud storage, utilities to support collaborative team labeling, and pre-trained models that help to label quicker (generally, with some cost).

One of the popular image labeling tools is labelImg. It is available at https://github.com/heartexlabs/labelImg/blob/master/README.rst. It is an open source, lightweight, fast, easy-to-use, Python-based application with a short learning curve.

Its limitation is that it supports only the creation of rectangle bounding boxes. It is currently...

Annotation formats

Similar to labeling tools, many different annotation formats are available for annotating images for computer vision applications. The common standards include COCO JSON, Pascal VOC XML, and YOLO PyTorch TXT. There are many more formats (e.g., TensorFlow TFRecord, CreateML JSON, and so on). However, this section covers only the previously listed three most common annotation standards due to space limitations. Furthermore, this section uses two images and labels extracted from the test set of the brain tumor object detection dataset available from Kaggle (https://www.kaggle.com/datasets/davidbroberts/brain-tumor-object-detection-datasets) to illustrate these data formats and demonstrate their differences, as shown in Figure 3.4. This section briefly discusses the key points of each annotation format, and interested readers can refer to the GitHub page of this chapter to inspect this same dataset in different formats in further detail.

Figure...

Labeling the images

This chapter uses labelImg to perform data labeling for object detection tasks. This tool requires installation on the local computer. Therefore, if you downloaded your images using Google Colab to your Google Drive, you need to map or download these images to a local computer to perform labeling. Run the following snippet in a terminal on a local computer to install labelImg using Python 3 (if you are running Python 2, please refer to the labelImg website for a guide):

pip3 install labelImg

After installing, run the following snippet to start the tool, where [IMAGE_PATH] is an optional argument to specify the path to the image folder, and [PRE-DEFINED CLASS FILE] is another optional argument to indicate the path to a text file (*.txt) that defines a list of class labels (one per line):

labelImg [IMAGE_PATH] [PRE-DEFINED CLASS FILE]

For instance, after downloading/synchronizing the simple_images folder for the downloaded images in the previous steps...

Annotation format conversions

Detectron2 has its data description built based on the COCO annotation format. In other words, it supports registering datasets using the COCO data annotation format. However, other data annotation formats are abundant, and you may download a dataset or use a labeling tool that supports another data format different from COCO. Therefore, this section covers the code snippets used to convert data from the popular Pascal VOC and YOLO formats to COCO style.

Important note

A statement that starts with an exclamation mark (!) means it is a Bash command to be executed in a Jupyter notebook (Google Colab) code cell. If you want to run it in a terminal, you can safely remove this exclamation mark and execute this statement.

By understanding the different data formats as described, it is relatively easy to write code to convert data from one format to another. However, this section uses the pylabel package to perform this conversion to speed up the development...

Summary

This chapter discussed the popular data sources for the computer vision community. These data sources often have pre-trained models that help you quickly build computer vision applications. We also learned about the common places to download computer vision datasets. If no datasets exist for a specific computer vision task, this chapter also helped you get images by downloading them from the internet and select a tool for labeling the downloaded images. Furthermore, the computer vision field is developing rapidly, and many different annotation formats are available. Therefore, this chapter also covered popular data formats and the steps to convert these formats into the format supported by Detectron2.

By this time, you should have your dataset ready. The next chapter discusses the architecture of Detectron2 with details regarding the backbone networks and how to select one for an object detection task before training an object detection model using Detectron2.

The rest of the chapter is locked

You have been reading a chapter from

Hands-On Computer Vision with Detectron2

Published in: Apr 2023Publisher: PacktISBN-13: 9781800561625

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Van Vung Pham

Van Vung Pham is a passionate research scientist in machine learning, deep learning, data science, and data visualization. He has years of experience and numerous publications in these areas. He is currently working on projects that use deep learning to predict road damage from pictures or videos taken from roads. One of the projects uses Detectron2 and Faster R-CNN to predict and classify road damage and achieve state-of-the-art results for this task. Dr. Pham obtained his PhD from the Computer Science Department, at Texas Tech University, Lubbock, Texas, USA. He is currently an assistant professor at the Computer Science Department, Sam Houston State University, Huntsville, Texas, USA.
Read more about Van Vung Pham

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages