You're reading from Applied Deep Learning and Computer Vision for Self-Driving Cars

Product typeBook

Published inAug 2020

Reading LevelIntermediate

PublisherPackt

ISBN-139781838646301

Edition1st Edition

Languages

Python

Tools

TensorFlow Keras

Concepts

Deep Learning

Authors (2):

Sumit Ranjan

Dr. S. Senthamilarasu

View More author details

The Principles and Foundations of Semantic Segmentation

In this chapter, we are going to talk about how deep learning and convolutional neural networks (CNNs) can be adapted to solve semantic segmentation tasks in computer vision.

In a self- driving car (SDC), the vehicle must know exactly where another vehicle is on the road or where a person is crossing the road. Semantic segmentation helps make these identifications. Semantic segmentation with CNNs effectively means classifying each pixel in the image. Thus, the idea is to create a map of fully detectable object areas in the image. Basically, what we want is an output image in the slide where every pixel has a label associated with it.

For example, semantic segmentation will label all the cars in an image, as shown here:

Fig 8.1: Semantic segmentation output

The demand for understanding data has increased in the...

Introduction to semantic segmentation

Numerous technology systems have emerged in recent years that have been designed to identify a car's surroundings. Understanding the scene around our surroundings turns out to be an important area of research for analyzing the geometry of scenes and the associated objects in the surroundings. CNNs have proved to be the most effective vision computing tool in image classification, object detection, and semantic segmentation. In an automated environment, it is important to make some critical decisions in order to understand a given scene in the surroundings at the pixel level. Semantic segmentation has proven to be one of the most effective methods of assigning labels to individual pixels in an image.

Researchers have proposed numerous ways for semantic pixel-wise labeling; some approaches have tried deep architecture pixel-wise labeling, and the results have been impressive. Since segmentation at the pixel level provides better...

Understanding the semantic segmentation architecture

The semantic segmentation network generally consists of an encoder-decoder network. The encoder produces high-level features using convolution, while the decoder helps in interpreting these high-level features using classes. The encoder is a common encoding mechanism that is used by pre-trained networks and the decoder weight that's learned while training a segmentation network. The following diagram shows the architecture of the encoder-decoder-based FCN architecture for semantic segmentation:

Fig 8.2: Semantic segmentation architecture

You can check out the preceding diagram at the following link: https://www.mdpi.com/2313-433X/4/10/116/pdf.

The encoder gradually reduces the spatial dimension with the help of pooling layers, while the decoder recovers the features of the object and spatial dimensions. You can read more about semantic segmentation in the paper on ECRU: An Encoder-Decoder-Based Convolution Neural...

Overview of different semantic segmentation architectures

There are lots of deep learning architectures and pre-trained models for semantic segmentation that have been released in recent times. In this section, we will discuss the popular semantic segmentation architectures, which are as follows:

U-Net
SegNet
PSPNet
DeepLabv3+
E-Net

We will start by introducing U-Net.

U-Net

U-Net won the award for the most challenging Grand Challenge for the Computer-Automated Detection of Caries in Bitewing Radiography at the International Symposium on Biomedical Imaging (ISBI) 2015 and also won the Cell Tracking Challenge at ISBI in 2015.

U-Net is the fastest and most precise semantic segmentation architecture. It outperformed methods such as the sliding window CNN at the ISBI challenge for semantic segmentation of neuron structures in electron microscopic stacks.

At ISBI 2015, it also won the two most challenging transmitted light microscopy categories, Phase contrast and DIC microscopy, by a large margin.

The main idea behind U-Net is to add successive layers to a normal contracting network, where upsampling operators replace pooling operations. Due to this, the layers of U-Net increase the resolution of the output. The most important modification in U-Net occurs in the upsampling...

SegNet

SegNet is a deep encoder-decoder architecture for multi-class pixel-wise segmentation that was researched and developed by members of the Computer Vision and Robotics Group (http://mi.eng.cam.ac.uk/Main/CVR) at the University of Cambridge, UK.

The SegNet architecture consists of an encoder network, a corresponding decoder network, and a final classification pixel-wise layer. It also consists of a series of non-linear processing layers (encoders) and a corresponding collection of decoders, accompanied by a pixel-wise classifier.

The architecture of SegNet can be seen in the following diagram:

Fig 8.4: SegNet architecture

You can also check out this diagram at https://mi.eng.cam.ac.uk/projects/segnet/.

The encoder typically consists of one or more convolutional layers with batch normalization and a ReLU, accompanied by non-overlapping max-pooling and sub-sampling. Sparse encoding, which results from the pooling process, is upsampled in the decoder...

Encoder

Convolutions and max-pooling are performed in the encoder, where 13 convolutional layers are taken from VGG-16. The corresponding max-pooling indices are stored while performing 2×2 max-pooling.

Decoder

Upsampling and convolutions are conducted in the decoder's softmax classifier, at the end of each pixel. The max-pooling indices at the corresponding encoder layer are recalled and upsampled during the upsampling process. Then, a K-class softmax classifier is used for predicting each pixel.

A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labeling was researched and developed by members of the Computer Vision and Robotics Group at the University of Cambridge, UK. Click on the following link for more details: http://mi.eng.cam.ac.uk/projects/segnet/.

In the next section, we'll cover the Pyramid Scene Parsing Network (PSPNet).

PSPNet

PSPNet -Full-Resolution Residual Networks were really computationally intensive and using them on full-scale images was really slow. In order to deal with this problem, PSPNet came into the picture. It applies four different max-pooling operations with four different window sizes and strides. Using the max-pooling layers allows us to extract feature information from different scales with more efficiency.

PSPNet achieved state-of-the-art performance on various datasets. It became popular after the ImageNet scene parsing challenge in 2016. It hit the PASCAL VOC 2012 benchmark and the Cityscapes benchmark with a mIoU record of 85.4% accuracy on PASCAL VOC 2012, and also achieved 80.2% on Cityscapes. The following is a link to the relevant paper: https://arxiv.org/pdf/1612.01105.

The following diagram shows the architecture of PSPNet:

Fig 8.5: PSPNet architecture

Check out https://hszhao.github.io/projects/pspnet/ to find out more about the PSPNet architecture...

DeepLabv3+

DeepLab is the semantic segmentation state-of-the-art model. In 2016, it was developed and open sourced by Google. Multiple versions have been released and many improvements have been made to the model since then. These include DeepLab V2, DeepLab V3, and DeepLab V3+.

Before the release of DeepLab V3+, we were able to encode multi-scale contextual information using filters and pooling operations at different rates; the newer networks could capture the objects with sharper boundaries by recovering spatial information. DeepLabv3+ combines these two approaches. It uses both the encoder-decoder and the spatial pyramid pooling modules.

The following diagram shows the architecture of DeepLabv3+, which consists of encoder and decoder modules:

Fig 8.6: DeepLabV3+ architecture

Let's look at the encoder and decoder modules in more detail:

Encoder: In the encoder step, essential information from the input image is extracted using a pre-trained convolutional neural...

E-Net

Real-time pixel-wise semantic segmentation is one of the great applications of semantic segmentation for SDCs. Accuracy can increase in SDCs, but deploying semantic segmentation is still a challenge. In this section, we'll look at an efficient neural network (E-Net) that aims to run on low-power mobile devices while improving accuracy.

E-Net is a popular network due to its ability to perform real-time pixel-wise semantic segmentation. E-Net is up to 18x faster, requires 75x fewer FLOPs, and has 79x fewer parameters than existing models such as U-Net and SegNet, leading to much better accuracy. E-Net networks are tested on the popular CamVid, Cityscapes, and SUN datasets.

The architecture of E-Net is as follows:

Fig 8.7: E-Net architecture

You can check out the preceding screenshot at https://arxiv.org/pdf/1606.02147.pdf.

This is a framework with one master and several branches that split from the master but also merge back via element-wise addition. ...

Summary

In this chapter, we learned about the importance of semantic segmentation in the field of SDCs. We also looked at an overview of a few popular deep learning architectures related to semantic segmentation: U-Net, SegNet, PSPNet, DeepLabv3+, and E-Net.

In the next chapter, we will implement semantic segmentation using E-Net. We will use it to detect various objects in images and videos.

The rest of the chapter is locked

You have been reading a chapter from

Applied Deep Learning and Computer Vision for Self-Driving Cars

Published in: Aug 2020Publisher: PacktISBN-13: 9781838646301

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Authors (2)

Sumit Ranjan

Sumit Ranjan is a silver medalist in his Bachelor of Technology (Electronics and Telecommunication) degree. He is a passionate data scientist who has worked on solving business problems to build an unparalleled customer experience across domains such as, automobile, healthcare, semi-conductor, cloud-virtualization, and insurance. He is experienced in building applied machine learning, computer vision, and deep learning solutions, to meet real-world needs. He was awarded Autonomous Self-Driving Car Scholar by KPIT Technologies. He has also worked on multiple research projects at Mercedes Benz Research and Development. Apart from work, his hobbies are traveling and exploring new places, wildlife photography, and blogging.
Read more about Sumit Ranjan

Dr. S. Senthamilarasu

Dr. S. Senthamilarasu was born and raised in the Coimbatore, Tamil Nadu. He is a technologist, designer, speaker, storyteller, journal reviewer educator, and researcher. He loves to learn new technologies and solves real world problems in the IT industry. He has published various journals and research papers and has presented at various international conferences. His research areas include data mining, image processing, and neural network. He loves reading Tamil novels and involves himself in social activities. He has also received silver medals in international exhibitions for his research products for children with an autism disorder. He currently lives in Bangalore and is working closely with lead clients.
Read more about Dr. S. Senthamilarasu

Other recommended products

Related to this chapter

Computer Vision with Python 3

The field of computer vision involves designing and implementing algorithms to understand images and extract meaningful information from them. This book enables you to build real-world applications using Python and open source image processing libraries.

BookAug 2017206 pages

The Computer Vision Workshop

With The Computer Vision Workshop, you’ll explore the basic and advanced techniques in video and image processing using OpenCV and Python. It is filled with real-world exercises and activities that will make the learning process easy and enjoyable.

BookJul 2020568 pages

Hands-On GPU-Accelerated Computer Vision with OpenCV and CUDA

This book is a guide to explore how accelerating of computer vision applications using GPUs will help you develop algorithms that work on complex image data in real time. It will solve the problems you face while deploying these algorithms on embedded platforms with the help of development boards from NVIDIA such as the Jetson TX1, Jetson TX2, and Jetson TK1.

BookSep 2018380 pages

Hands-On Algorithms for Computer Vision

The field of Computer Vision has seen advancements in terms of processing power and performance. Many algorithms are introduced to perform Computer Vision tasks efficiently. This book is a starting point for anyone interested in this field and wants to dig deeper into the most practical algorithms used by professional Computer Vision developers.

BookJul 2018290 pages

Machine Learning for Healthcare Analytics Projects

Machine Learning in the healthcare domain is booming because of its abilities to provide accurate and stabilized techniques. This book is packed with new methodologies to create efficient solutions for healthcare analytics. We will build five end-to-end projects to evaluate the efficiency of AI apps to carry out simple-to-complex healthcare analytics tasks.

BookOct 2018134 pages

Python Image Processing Cookbook

Advancements in wireless devices and mobile technology have enabled the acquisition of a tremendous amount of graphics, pictures, and videos. Through cutting edge recipes, this book provides coverage on tools, algorithms, and analysis for image processing. This book provides solutions addressing the challenges and complex tasks of image processing.

BookApr 2020438 pages

OpenCV 3.x with Python By Example

Computer vision is found everywhere in modern technology. OpenCV for Python enables us to run computer vision algorithms in real time. With the advent of powerful machines, we have more processing power to work with. Using this technology, we can seamlessly integrate our computer vision applications into the cloud. Focusing on OpenCV 3.x and Python 3.6, this book will walk you through all the building blocks needed to build amazing computer vision applications with ease.

BookJan 2018268 pages

R Deep Learning Projects

R is a popular programming language used by statisticians and mathematicians for statistical analysis, and is popularly used for deep learning. This book demonstrates end-to-end implementations of five real-world projects on popular topics in deep learning such as handwritten digit recognition, traffic light detection, fraud detection, text generation, and sentiment analysis. You'll see how to train effective neural networks in R—including convolutional neural networks, recurrent neural networks and LSTMs—and also see how neural networks can be trained using GPU capabilities. You will use popular R libraries and packages—such as MXNetR, H2O, deepnet, and more—to implement the projects. By the end of this book, you will have a better understanding of deep learning concepts and techniques and how to use them in a practical setting.

BookFeb 2018258 pages

Raspberry Pi Computer Vision Programming

You will learn the basics of hardware and software required for image processing and computer vision with Raspberry Pi and Python 3. You will have a look at all the major image processing, manipulation, and computer vision techniques and algorithms in detail using engaging examples. You will build a lot of real-life computer vision applications.

BookJun 2020306 pages5

Ensemble Machine Learning Cookbook

This book uses a recipe-based approach to showcase the power of machine learning algorithms to build ensemble models using Python libraries. Through this book, you will be able to pick up the code, understand in depth how it works, execute and implement it efficiently. This will be a desk reference to implement a wide range of tasks and solve the common and uncommon problems in ensemble machine learning domain.

BookJan 2019336 pages

Hands-On Image Processing with Python

This book covers how to use the image processing libraries in Python. It will enable you to write code snippets to implement complex image processing algorithms such as image enhancement, filtering, segmentation, object detection, and more. You will also be able to use machine learning and deep learning models and learn to implement them with ease.

BookNov 2018492 pages

OpenCV 3 Computer Vision with Python Cookbook

OpenCV 3 is a native cross-platform library for computer vision, machine learning, and image processing. OpenCV's convenient high-level APIs hide very powerful internals designed for computational efficiency that can take advantage of multicore and GPU processing. This book will help you tackle increasingly challenging computer vision problems by providing a number of recipes that you can use to improve your applications.

BookMar 2018306 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages