Reader small image

You're reading from  Deep Learning for Computer Vision

Product typeBook
Published inJan 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781788295628
Edition1st Edition
Languages
Right arrow
Author (1)
Rajalingappaa Shanmugamani
Rajalingappaa Shanmugamani
author image
Rajalingappaa Shanmugamani

Rajalingappaa Shanmugamani is currently working as an Engineering Manager for a Deep learning team at Kairos. Previously, he worked as a Senior Machine Learning Developer at SAP, Singapore and worked at various startups in developing machine learning products. He has a Masters from Indian Institute of TechnologyMadras. He has published articles in peer-reviewed journals and conferences and submitted applications for several patents in the area of machine learning. In his spare time, he coaches programming and machine learning to school students and engineers.
Read more about Rajalingappaa Shanmugamani

Right arrow

Chapter 4. Object Detection

Object detection is the act of finding the location of an object in an image. In this chapter, we will learn the techniques of object detection and implement pedestrian detection by understanding the following topics:

  • Basics and the difference between localization and detection
  • Various datasets and their descriptions
  • Algorithms used for object localization and detection
  • TensorFlow API for object detection
  • Training new object detection models
  • Pedestrian detection on a moving car with YOLO algorithm

Detecting objects in an image


Object detection had an explosion concerning both applications and research in recent years. Object detection is a problem of importance in computer vision. Similar to image classification tasks, deeper networks have shown better performance in detection. At present, the accuracy of these techniques is excellent. Hence it used in many applications.

Image classification labels the image as a whole. Finding the position of the object in addition to labeling the object is called object localization. Typically, the position of the object is defined by rectangular coordinates. Finding multiple objects in the image with rectangular coordinates is called detection. Here is an example of object detection:

The image shows four objects with bounding boxes. We will learn algorithms that can perform the task of finding the boxes. The applications are enormous in robot vision, such as self-driving cars and industrial objects. We can summarize localization and detection tasks...

Exploring the datasets


The datasets available for object localization and detection are many. In this section, we will explore the datasets that are used by the research community to evaluate the algorithms. There are datasets with a varying number of objects, ranging from 20 to 200 annotated in these datasets, which makes object detection hard. Some datasets have too many objects in one image compared to other datasets with just one object per image. Next, we will see the datasets in detail.

ImageNet dataset

ImageNet has data for evaluating classification, localization, and detection tasks. The Chapter 2, Image Classification, discussed classification datasets in detail. Similar to classification data, there are 1,000 classes for localization tasks. The accuracy is calculated based on the top five detections. There will be at least one bounding box in all the images. There are 200 objects for detection problems with 470,000 images, with an average of 1.1 objects per image. 

PASCAL VOC challenge...

Localizing algorithms 


Localization algorithms are an extension of the materials learned in Chapter 2, Image Classification and Chapter 3, Image Retrieval. In image classification, an image is passed through several layers of a CNN (convolutional neural network). The final layer of CNN outputs the probabilistic value, belonging to each of the labels. This can be extended to localize the objects. We will see these ideas in the following sections.

Localizing objects using sliding windows

An intuitive way of localization is to predict several cropped portions of an image with an object. The cropping of the images can be done by moving a window across the image and predicting for every window. The method of moving a smaller window than the image and cropping the image according to window size is called a sliding window. A prediction can be made for every cropped window of the image which is called sliding window object detection. 

The prediction can be done by the deep learning model trained for...

Detecting objects


There are several variants of object detection algorithms. A few algorithms that come with the object detection API are discussed here.

Regions of the convolutional neural network (R-CNN)

The first work in this series was regions for CNNs proposed by Girshick et al.(https://arxiv.org/pdf/1311.2524.pdf) . It proposes a few boxes and checks whether any of the boxes correspond to the ground truth. Selective search was used for these region proposals. Selective search proposes the regions by grouping the color/texture of windows of various sizes. The selective search looks for blob-like structures. It starts with a pixel and produces a blob at a higher scale. It produces around 2,000 region proposals. This region proposal is less when compared to all the sliding windows possible. 

The proposals are resized and passed through a standard CNN architecture such as Alexnet/VGG/Inception/ResNet. The last layer of the CNN is trained with an SVM identifying the object with a no-object...

Object detection API


Google released pre-trained models with various algorithms trained on the COCO dataset for public use. The API is built on top of TensorFlow and intended for constructing, training, and deploying object detection models. The APIs support both object detection and localization tasks. The availability of pre-trained models enables the fine-tuning of new data and hence making the training faster. These different models have trade-offs between speed and accuracy. 

Installation and setup

Install the Protocol Buffers (protobuf) compiler with the following commands. Create a directory for protobuf and download the library directly:

mkdir protoc_3.3
cd protoc_3.3
wget https://github.com/google/protobuf/releases/download/v3.3.0/protoc-3.3.0-linux-x86_64.zip

Change the permission of the folder and extract the contents, as shown here:

chmod 775 protoc-3.3.0-linux-x86_64.zip
unzip protoc-3.3.0-linux-x86_64.zip

Protocol Buffers (protobuf) is Google's language-neutral, platform-neutral...

The YOLO object detection algorithm 


A recent algorithm for object detection is You look only once (YOLO). The image is divided into multiple grids. Each grid cell of the image runs the same algorithm. Let's start the implementation by defining layers with initializers:

def pooling_layer(input_layer, pool_size=[2, 2], strides=2, padding='valid'):
    layer = tf.layers.max_pooling2d(
        inputs=input_layer,
        pool_size=pool_size,
        strides=strides,
        padding=padding
    )
    add_variable_summary(layer, 'pooling')
    return layer

def convolution_layer(input_layer, filters, kernel_size=[3, 3], padding='valid',
                      activation=tf.nn.leaky_relu):
    layer = tf.layers.conv2d(
        inputs=input_layer,
        filters=filters,
        kernel_size=kernel_size,
        activation=activation,
        padding=padding,
        weights_initializer=tf.truncated_normal_initializer(0.0, 0.01),
        weights_regularizer=tf.l2_regularizer(0.0005)
    )
    add_variable_summary...

Summary


In this chapter, we have learned the difference between object localization and detection tasks. Several datasets and evaluation criteria were discussed. Various approaches to localization problems and algorithms, such as variants of R-CNN and SSD models for detection, were discussed. The implementation of detection in open-source repositories was covered.  We trained a model for pedestrian detection using the techniques. We also learned about various trade-offs in training such models.

In the next chapter, we will learn about semantic segmentation algorithms. We will use the knowledge to implement the segmentation algorithms for medical imaging and satellite imagery problems. 

 

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Deep Learning for Computer Vision
Published in: Jan 2018Publisher: PacktISBN-13: 9781788295628
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Rajalingappaa Shanmugamani

Rajalingappaa Shanmugamani is currently working as an Engineering Manager for a Deep learning team at Kairos. Previously, he worked as a Senior Machine Learning Developer at SAP, Singapore and worked at various startups in developing machine learning products. He has a Masters from Indian Institute of TechnologyMadras. He has published articles in peer-reviewed journals and conferences and submitted applications for several patents in the area of machine learning. In his spare time, he coaches programming and machine learning to school students and engineers.
Read more about Rajalingappaa Shanmugamani