Reader small image

You're reading from  Applied Deep Learning and Computer Vision for Self-Driving Cars

Product typeBook
Published inAug 2020
Reading LevelIntermediate
PublisherPackt
ISBN-139781838646301
Edition1st Edition
Languages
Right arrow
Authors (2):
Sumit Ranjan
Sumit Ranjan
author image
Sumit Ranjan

Sumit Ranjan is a silver medalist in his Bachelor of Technology (Electronics and Telecommunication) degree. He is a passionate data scientist who has worked on solving business problems to build an unparalleled customer experience across domains such as, automobile, healthcare, semi-conductor, cloud-virtualization, and insurance. He is experienced in building applied machine learning, computer vision, and deep learning solutions, to meet real-world needs. He was awarded Autonomous Self-Driving Car Scholar by KPIT Technologies. He has also worked on multiple research projects at Mercedes Benz Research and Development. Apart from work, his hobbies are traveling and exploring new places, wildlife photography, and blogging.
Read more about Sumit Ranjan

Dr. S. Senthamilarasu
Dr. S. Senthamilarasu
author image
Dr. S. Senthamilarasu

Dr. S. Senthamilarasu was born and raised in the Coimbatore, Tamil Nadu. He is a technologist, designer, speaker, storyteller, journal reviewer educator, and researcher. He loves to learn new technologies and solves real world problems in the IT industry. He has published various journals and research papers and has presented at various international conferences. His research areas include data mining, image processing, and neural network. He loves reading Tamil novels and involves himself in social activities. He has also received silver medals in international exhibitions for his research products for children with an autism disorder. He currently lives in Bangalore and is working closely with lead clients.
Read more about Dr. S. Senthamilarasu

View More author details
Right arrow
The Principles and Foundations of Semantic Segmentation

In this chapter, we are going to talk about how deep learning and convolutional neural networks (CNNs) can be adapted to solve semantic segmentation tasks in computer vision.

In a self- driving car (SDC), the vehicle must know exactly where another vehicle is on the road or where a person is crossing the road. Semantic segmentation helps make these identifications. Semantic segmentation with CNNs effectively means classifying each pixel in the image. Thus, the idea is to create a map of fully detectable object areas in the image. Basically, what we want is an output image in the slide where every pixel has a label associated with it.

For example, semantic segmentation will label all the cars in an image, as shown here:

Fig 8.1: Semantic segmentation output

The demand for understanding data has increased in the...

Introduction to semantic segmentation

Numerous technology systems have emerged in recent years that have been designed to identify a car's surroundings. Understanding the scene around our surroundings turns out to be an important area of research for analyzing the geometry of scenes and the associated objects in the surroundings. CNNs have proved to be the most effective vision computing tool in image classification, object detection, and semantic segmentation. In an automated environment, it is important to make some critical decisions in order to understand a given scene in the surroundings at the pixel level. Semantic segmentation has proven to be one of the most effective methods of assigning labels to individual pixels in an image.

Researchers have proposed numerous ways for semantic pixel-wise labeling; some approaches have tried deep architecture pixel-wise labeling, and the results have been impressive. Since segmentation at the pixel level provides better...

Understanding the semantic segmentation architecture

The semantic segmentation network generally consists of an encoder-decoder network. The encoder produces high-level features using convolution, while the decoder helps in interpreting these high-level features using classes. The encoder is a common encoding mechanism that is used by pre-trained networks and the decoder weight that's learned while training a segmentation network. The following diagram shows the architecture of the encoder-decoder-based FCN architecture for semantic segmentation: 

Fig 8.2: Semantic segmentation architecture

You can check out the preceding diagram at the following link: https://www.mdpi.com/2313-433X/4/10/116/pdf.

The encoder gradually reduces the spatial dimension with the help of pooling layers, while the decoder recovers the features of the object and spatial dimensions. You can read more about semantic segmentation in the paper on ECRU: An Encoder-Decoder-Based Convolution Neural...

Overview of different semantic segmentation architectures

There are lots of deep learning architectures and pre-trained models for semantic segmentation that have been released in recent times. In this section, we will discuss the popular semantic segmentation architectures, which are as follows:

  • U-Net
  • SegNet
  • PSPNet
  • DeepLabv3+
  • E-Net

We will start by introducing U-Net.

U-Net

U-Net won the award for the most challenging Grand Challenge for the Computer-Automated Detection of Caries in Bitewing Radiography at the International Symposium on Biomedical Imaging (ISBI) 2015 and also won the Cell Tracking Challenge at ISBI in 2015. 

U-Net is the fastest and most precise semantic segmentation architecture. It outperformed methods such as the sliding window CNN at the ISBI challenge for semantic segmentation of neuron structures in electron microscopic stacks.

 At ISBI 2015, it also won the two most challenging transmitted light microscopy categories, Phase contrast and DIC microscopy, by a large margin. 

The main idea behind U-Net is to add successive layers to a normal contracting network, where upsampling operators replace pooling operations. Due to this, the layers of U-Net increase the resolution of the output. The most important modification in U-Net occurs in the upsampling...

SegNet

SegNet is a deep encoder-decoder architecture for multi-class pixel-wise segmentation that was researched and developed by members of the Computer Vision and Robotics Group (http://mi.eng.cam.ac.uk/Main/CVR) at the University of Cambridge, UK. 

The SegNet architecture consists of an encoder network, a corresponding decoder network, and a final classification pixel-wise layer. It also consists of a series of non-linear processing layers (encoders) and a corresponding collection of decoders, accompanied by a pixel-wise classifier.

The architecture of SegNet can be seen in the following diagram:

Fig 8.4: SegNet architecture

You can also check out this diagram at https://mi.eng.cam.ac.uk/projects/segnet/.

The encoder typically consists of one or more convolutional layers with batch normalization and a ReLU, accompanied by non-overlapping max-pooling and sub-sampling. Sparse encoding, which results from the pooling process, is upsampled in the decoder...

Encoder

Convolutions and max-pooling are performed in the encoder, where 13 convolutional layers are taken from VGG-16. The corresponding max-pooling indices are stored while performing 2×2 max-pooling.

Decoder

Upsampling and convolutions are conducted in the decoder's softmax classifier, at the end of each pixel. The max-pooling indices at the corresponding encoder layer are recalled and upsampled during the upsampling process. Then, a K-class softmax classifier is used for predicting each pixel.

A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labeling was researched and developed by members of the Computer Vision and Robotics Group at the University of Cambridge, UK. Click on the following link for more details: http://mi.eng.cam.ac.uk/projects/segnet/.

In the next section, we'll cover the Pyramid Scene Parsing Network (PSPNet).

PSPNet

PSPNet -Full-Resolution Residual Networks were really computationally intensive and using them on full-scale images was really slow. In order to deal with this problem, PSPNet came into the picture. It applies four different max-pooling operations with four different window sizes and strides. Using the max-pooling layers allows us to extract feature information from different scales with more efficiency.

PSPNet achieved state-of-the-art performance on various datasets. It became popular after the ImageNet scene parsing challenge in 2016. It hit the PASCAL VOC 2012 benchmark and the Cityscapes benchmark with a mIoU record of 85.4% accuracy on PASCAL VOC 2012, and also achieved 80.2% on Cityscapes. The following is a link to the relevant paper: https://arxiv.org/pdf/1612.01105.

The following diagram shows the architecture of PSPNet:

Fig 8.5: PSPNet architecture

Check out https://hszhao.github.io/projects/pspnet/ to find out more about the PSPNet architecture...

DeepLabv3+

DeepLab is the semantic segmentation state-of-the-art model. In 2016, it was developed and open sourced by Google. Multiple versions have been released and many improvements have been made to the model since then. These include DeepLab V2, DeepLab V3, and DeepLab V3+.

Before the release of DeepLab V3+, we were able to encode multi-scale contextual information using filters and pooling operations at different rates; the newer networks could capture the objects with sharper boundaries by recovering spatial information. DeepLabv3+ combines these two approaches. It uses both the encoder-decoder and the spatial pyramid pooling modules.

The following diagram shows the architecture of DeepLabv3+, which consists of encoder and decoder modules: 

Fig 8.6: DeepLabV3+ architecture

Let's look at the encoder and decoder modules in more detail:

  • Encoder: In the encoder step, essential information from the input image is extracted using a pre-trained convolutional neural...

E-Net

Real-time pixel-wise semantic segmentation is one of the great applications of semantic segmentation for SDCs. Accuracy can increase in SDCs, but deploying semantic segmentation is still a challenge. In this section, we'll look at an efficient neural network (E-Net) that aims to run on low-power mobile devices while improving accuracy.

E-Net is a popular network due to its ability to perform real-time pixel-wise semantic segmentation. E-Net is up to 18x faster, requires 75x fewer FLOPs, and has 79x fewer parameters than existing models such as U-Net and SegNet, leading to much better accuracy. E-Net networks are tested on the popular CamVid, Cityscapes, and SUN datasets.

The architecture of E-Net is as follows:

Fig 8.7: E-Net architecture

You can check out the preceding screenshot at https://arxiv.org/pdf/1606.02147.pdf.

This is a framework with one master and several branches that split from the master but also merge back via element-wise addition. ...

Summary

In this chapter, we learned about the importance of semantic segmentation in the field of SDCs. We also looked at an overview of a few popular deep learning architectures related to semantic segmentation: U-Net, SegNet, PSPNet, DeepLabv3+, and E-Net.

In the next chapter, we will implement semantic segmentation using E-Net. We will use it to detect various objects in images and videos. 

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Applied Deep Learning and Computer Vision for Self-Driving Cars
Published in: Aug 2020Publisher: PacktISBN-13: 9781838646301
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Authors (2)

author image
Sumit Ranjan

Sumit Ranjan is a silver medalist in his Bachelor of Technology (Electronics and Telecommunication) degree. He is a passionate data scientist who has worked on solving business problems to build an unparalleled customer experience across domains such as, automobile, healthcare, semi-conductor, cloud-virtualization, and insurance. He is experienced in building applied machine learning, computer vision, and deep learning solutions, to meet real-world needs. He was awarded Autonomous Self-Driving Car Scholar by KPIT Technologies. He has also worked on multiple research projects at Mercedes Benz Research and Development. Apart from work, his hobbies are traveling and exploring new places, wildlife photography, and blogging.
Read more about Sumit Ranjan

author image
Dr. S. Senthamilarasu

Dr. S. Senthamilarasu was born and raised in the Coimbatore, Tamil Nadu. He is a technologist, designer, speaker, storyteller, journal reviewer educator, and researcher. He loves to learn new technologies and solves real world problems in the IT industry. He has published various journals and research papers and has presented at various international conferences. His research areas include data mining, image processing, and neural network. He loves reading Tamil novels and involves himself in social activities. He has also received silver medals in international exhibitions for his research products for children with an autism disorder. He currently lives in Bangalore and is working closely with lead clients.
Read more about Dr. S. Senthamilarasu