You're reading from Applied Deep Learning and Computer Vision for Self-Driving Cars

Product typeBook

Published inAug 2020

Reading LevelIntermediate

PublisherPackt

ISBN-139781838646301

Edition1st Edition

Languages

Python

Tools

TensorFlow Keras

Concepts

Deep Learning

Authors (2):

Sumit Ranjan

Dr. S. Senthamilarasu

View More author details

Vehicle Detection Using OpenCV and Deep Learning

Object detection is one of the important applications of computer vision used in self-driving cars. Object detection in images means not only identifying the kind of object but also localizing it within the image by generating the coordinates of a bounding box that contains the object. We can summarize object detection as follows:

An example of object detection can be seen in the following image:

Fig 11.1: Object detection

Here, you can see that the biker is detected as a person and that the bike is detected as a motorbike.

In this chapter, we are going to use OpenCV and You Only Look Once (YOLO) as the deep learning architecture for vehicle detection. Due to this, we'll learn about the state-of-the-art image detection algorithm known as YOLO. YOLO can view an image...

What makes YOLO different?

In this chapter, we will be using version 3 of the YOLO object detection algorithm, which further improves upon the old version of YOLO in terms of both speed and accuracy. Let's see how YOLO is different from other object detection networks:

YOLO looks at the whole image during the testing process, so the prediction of YOLO is informed by the global context of the image.
In general, networks such as R-CNN require thousands of networks to predict a single image, but in the case of YOLO, only one network is required to look into the image and make predictions.
Due to the use of a single neural network, YOLO is 1,000x faster than other object detection networks (https://pjreddie.com/darknet/yolo/).
YOLO treats detection as a regression problem.
YOLO is extremely fast and accurate.

YOLO works as follows:

YOLO takes the input image and divides it into a grid of SxS. Every grid cell predicts one entity.
YOLO applies image classification and localization...

The YOLO loss function

The YOLO loss function is calculated in the following steps:

First, we find the bounding boxes with the highest intersection over union (IoU) and with the correct bounding boxes.
Then, we calculate the confidence loss, which means the probability of the object being present inside a given bounding box.
Next, we calculate the classification loss, which indicates the present class of objects within the bounding box.
Finally, we calculate the coordinate loss for matching the detected boxes.

In summary, the total loss function is as follows:

In the next section, we will learn about the YOLO architecture.

The YOLO architecture

The YOLO architecture is inspired by the image classification model created by GoogLeNet. The YOLO network consists of 24 convolutional layers, followed by two fully connected layers. It also has alternating 1×1 convolutional layers, which reduce the feature spaces from preceding layers.

The convolution layers that are used in YOLO are from the pre-trained model of the ImageNet task, sampled at half the resolution (244x244), and then double the resolution. YOLO uses leaky ReLU for all the layers and a linear activation function for the final layers.

The following diagram shows the model architecture of YOLO:

Fig 11.2: YOLO architecture

The following is a link to the official YOLO website: https://pjreddie.com/darknet/yolo/.

In the next section, we will learn about the different types of YOLO.

Fast YOLO

Fast YOLO is – as the name suggests – a faster version of YOLO. Fast YOLO uses nine convolutional layers and fewer filters than YOLO. The training and testing parameters are the same in both models. The output of Fast YOLO is 7x7x30 tensors.

YOLO v2

YOLO v2 (also known as YOLO9000) increased YOLO's original input size from 224x224 to 448x448. It was observed that this increase in size resulted in an improved mAP. YOLO v2 also uses batch normalization, which leads to a significant improvement in the accuracy of the model. It also resulted in an improvement in the detection of small objects, which was achieved by dividing the entire image using a 13x13 grid. In order to obtain good priors (anchors) for the model, YOLO v2 runs k-means clustering on the bounding box scale. YOLO v2 also uses five anchor boxes, as shown in the following image:

Fig 11.3: Anchor boxes

In the preceding image, the boxes in blue are anchor boxes, while the box in red is the ground truth box for the object.

YOLOv2 uses the Darknet architecture for object classification and has 19 convolution layers, five max-pooling layers, and a softmax layer.

YOLO v3

YOLO v3 is the most popular model of YOLO. It uses nine anchor boxes. YOLO V3 uses logistic regression for prediction instead of Softmax (which is used in YOLO v2). YOLO v3 also uses the Darknet-53 network for feature extraction, which consists of 53 convolutional layers.

In the next section, we will implement YOLO for object detection in both image and video.

Implementation of YOLO object detection

Now, let's explore how to implement YOLO v3 with Python. We will be using an implementation of YOLO v3 that has been trained on the COCO dataset.

The COCO dataset contains over 1.5 million object instances within 80 different object categories. We will use a pre-trained model that has been trained on the COCO dataset and explore its capabilities. Realistically, it would take many hours of training, even after using a high-end GPU, to achieve a reasonable model that can predict the required classes with good accuracy. Therefore, we will download the weights of the pre-trained network. This network is hugely complex, and the actual H5 file for the weights is over 200 MB in size.

Common objects in content (COCO) is a large-scale object detection, segmentation, and captioning dataset. The official website for COCO is http://cocodataset.org/#home.
COCO has several features:

Object segmentation
Recognition in context
Superpixel stuff...

Importing the libraries

The first step is to import the libraries.

We will import the numpy, openCV, and YOLO libraries for implementation purposes:

import os
import time
import cv2
import numpy as np
from model.yolo_model import YOLO

In the next section, we will write code for the image function so that we can resize the image, as per the architecture.

Processing the image function

Next, we will write a process image function. This function will reduce or expand the image, depending on its original size.

Here, we want to transform the image as per the input for the YOLO model:

def process_image(img):
   """Resize, reduce and expand image.

    # Argument:
        img: original image.

    # Returns
        image_org: ndarray(64, 64, 3), processed image.
    """
    image_org = cv2.resize(img, (416, 416),
     interpolation=cv2.INTER_CUBIC)
     image_org = np.array(image_org, dtype='float32')
     image_org /= 255.
     image_org = np.expand_dims(image_org, axis=0)

     return image_org

In the next section, we will write a class function so that we can access the classes from a text file.

The get class function

The class function will help us grab classes from the classes text file. We can find the coco_classes.txt file in the data folder in the project, in the section, Importing YOLO.

The COCO dataset contains almost 80 classes. It can detect them as follows:

def get_classes(file):
    """Get classes name.

    # Argument:
        file: classes name for database.

    # Returns
        class_names: List, classes name.

    """
    with open(file) as f:
     name_of_class = f.readlines()
     name_of_class_names = [c.strip() for c in name_of_class]

     return name_of_class

In the next section, we will write code for the box function, which is useful for drawing boxes around the objects in images.

Draw box function

The draw function will draw a box inside the identified image and put text on the image as a label, as well as place a prediction percentage for the identified class.

We will use OpenCV techniques in this step:

def draw_box(image, image_boxes, image_scores, image_classes, image_all_classes):
    """Draw the boxes on the image.

    # Argument:
        image: original image.
        image_boxes: ndarray, boxes of objects.
        image_classes: ndarray, classes of objects.
        image_scores: ndarray, scores of objects.
        image_all_classes: all classes name.
 """
     for box, score, cl in zip(image_boxes, image_scores, image_classes):
     x, y, w, h = box

     image_top = max(0, np.floor(x + 0.5).astype(int))
     image_left = max(0, np.floor(y + 0.5).astype(int))
     image_right = min(image.shape[1], np.floor(x + w + 0.5).astype(int))
     image_bottom = min(image.shape[0], np.floor(y + h + 0.5).astype(int))

     cv2.rectangle(image, (image_top...

Detect image function

The detect image function takes the image and uses the YOLO3 network to predict the class of objects within the image using yolo.predict. The code for this is as follows:

def detect_image(image, yolo, all_classes):
    """Use yolo v3 to detect images.

    # Argument:
        image: original image.
        yolo: YOLO, yolo model.
        all_classes: all classes name.

    # Returns:
        image: processed image.
    """
    pimage = process_image(image)

    start = time.time()
     image_boxes, image_classes, image_scores = yolo.predict(pimage, image.shape)
     end = time.time()

     print('time: {0:.2f}s'.format(end - start))

     if boxes is not None:
     draw_boxes(image, image_boxes, image_scores, image_classes, image_all_classes)

    return image

In the next section, we will write some code that will detect objects in videos.

Detect video function

If we want to track a person or vehicle in a video, we can use the following function:

def detect_video(video, yolo, all_classes):
    """Use yolo v3 to detect video.

    # Argument:
        video: video file.
        yolo: YOLO, yolo model.
        all_classes: all classes name.
    """
    video_path = os.path.join("videos", "test", video)
    camera = cv2.VideoCapture(video_path)
    cv2.namedWindow("detection", cv2.WINDOW_AUTOSIZE)

    # Prepare for saving the detected video
    sz = (int(camera.get(cv2.CAP_PROP_FRAME_WIDTH)),
        int(camera.get(cv2.CAP_PROP_FRAME_HEIGHT)))
    fourcc = cv2.VideoWriter_fourcc(*'mpeg')

    
    vout = cv2.VideoWriter()
    vout.open(os.path.join("videos", "res", video), fourcc, 20, sz, True)

    while True:
        res, frame = camera.read()

        if not res:
            break

        image = detect_image(frame, yolo, all_classes)
        cv2.imshow("...

Importing YOLO

In this section, we will create an instance of YOLO classes. Note that it may take a little time as the YOLO model needs to load up:

yolo = YOLO(0.6, 0.5)
file = 'data/coco_classes.txt'
all_classes = get_classes(file)

In the next section, we will test the implementation of the YOLO model by predicting objects in an image, as well as a video.

Detecting objects in images

In this section, we will predict the objects present in an image using YOLO. In the following code block we will start with importing the image:

f = 'image.jpg'
path = 'images/'+f
image = cv2.imread(path)

The input image looks as follows:

Fig 11.4: Input image

We will perform the prediction using detect_image method in the following code:

image = detect_image(image, yolo, all_classes)
cv2.imwrite('images/res/' + f, image)

This yields the following prediction:

Fig 11.5: Output image

Here, we can observe that it predicted a person with 100% accuracy and the motorbike with 100% accuracy as well. You can take different images and experiment with this yourself!

Detecting objects in videos

Detecting objects in videos may take time. We will import the video name library.mp4 and perform prediction using detect_video method:

# # detect videos one at a time in videos/test folder 
video = 'library.mp4'
detect_video(video, yolo, all_classes)

This yields the following prediction:

Fig 11.6: Prediction

Here, YOLO predicted a person and a bicycle with 100% confidence!

Summary

In this chapter, we learned about object detection, which is an important aspect of autonomous vehicles. We used a popular pre-trained model called YOLO. Then, we created a software pipeline to perform object predictions on both images and videos and also saw the high quality of YOLO's performance.

This is the last hands-on implementation in this book. In the next chapter, we will read about the next steps to follow in the field of self-driving cars. We will also learn about sensor fusion.

The rest of the chapter is locked

You have been reading a chapter from

Applied Deep Learning and Computer Vision for Self-Driving Cars

Published in: Aug 2020Publisher: PacktISBN-13: 9781838646301

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Authors (2)

Sumit Ranjan

Sumit Ranjan is a silver medalist in his Bachelor of Technology (Electronics and Telecommunication) degree. He is a passionate data scientist who has worked on solving business problems to build an unparalleled customer experience across domains such as, automobile, healthcare, semi-conductor, cloud-virtualization, and insurance. He is experienced in building applied machine learning, computer vision, and deep learning solutions, to meet real-world needs. He was awarded Autonomous Self-Driving Car Scholar by KPIT Technologies. He has also worked on multiple research projects at Mercedes Benz Research and Development. Apart from work, his hobbies are traveling and exploring new places, wildlife photography, and blogging.
Read more about Sumit Ranjan

Dr. S. Senthamilarasu

Dr. S. Senthamilarasu was born and raised in the Coimbatore, Tamil Nadu. He is a technologist, designer, speaker, storyteller, journal reviewer educator, and researcher. He loves to learn new technologies and solves real world problems in the IT industry. He has published various journals and research papers and has presented at various international conferences. His research areas include data mining, image processing, and neural network. He loves reading Tamil novels and involves himself in social activities. He has also received silver medals in international exhibitions for his research products for children with an autism disorder. He currently lives in Bangalore and is working closely with lead clients.
Read more about Dr. S. Senthamilarasu

Other recommended products

Related to this chapter

Computer Vision with Python 3

The field of computer vision involves designing and implementing algorithms to understand images and extract meaningful information from them. This book enables you to build real-world applications using Python and open source image processing libraries.

BookAug 2017206 pages

The Computer Vision Workshop

With The Computer Vision Workshop, you’ll explore the basic and advanced techniques in video and image processing using OpenCV and Python. It is filled with real-world exercises and activities that will make the learning process easy and enjoyable.

BookJul 2020568 pages

Hands-On GPU-Accelerated Computer Vision with OpenCV and CUDA

This book is a guide to explore how accelerating of computer vision applications using GPUs will help you develop algorithms that work on complex image data in real time. It will solve the problems you face while deploying these algorithms on embedded platforms with the help of development boards from NVIDIA such as the Jetson TX1, Jetson TX2, and Jetson TK1.

BookSep 2018380 pages

Hands-On Algorithms for Computer Vision

The field of Computer Vision has seen advancements in terms of processing power and performance. Many algorithms are introduced to perform Computer Vision tasks efficiently. This book is a starting point for anyone interested in this field and wants to dig deeper into the most practical algorithms used by professional Computer Vision developers.

BookJul 2018290 pages

Machine Learning for Healthcare Analytics Projects

Machine Learning in the healthcare domain is booming because of its abilities to provide accurate and stabilized techniques. This book is packed with new methodologies to create efficient solutions for healthcare analytics. We will build five end-to-end projects to evaluate the efficiency of AI apps to carry out simple-to-complex healthcare analytics tasks.

BookOct 2018134 pages

Python Image Processing Cookbook

Advancements in wireless devices and mobile technology have enabled the acquisition of a tremendous amount of graphics, pictures, and videos. Through cutting edge recipes, this book provides coverage on tools, algorithms, and analysis for image processing. This book provides solutions addressing the challenges and complex tasks of image processing.

BookApr 2020438 pages

OpenCV 3.x with Python By Example

Computer vision is found everywhere in modern technology. OpenCV for Python enables us to run computer vision algorithms in real time. With the advent of powerful machines, we have more processing power to work with. Using this technology, we can seamlessly integrate our computer vision applications into the cloud. Focusing on OpenCV 3.x and Python 3.6, this book will walk you through all the building blocks needed to build amazing computer vision applications with ease.

BookJan 2018268 pages

R Deep Learning Projects

R is a popular programming language used by statisticians and mathematicians for statistical analysis, and is popularly used for deep learning. This book demonstrates end-to-end implementations of five real-world projects on popular topics in deep learning such as handwritten digit recognition, traffic light detection, fraud detection, text generation, and sentiment analysis. You'll see how to train effective neural networks in R—including convolutional neural networks, recurrent neural networks and LSTMs—and also see how neural networks can be trained using GPU capabilities. You will use popular R libraries and packages—such as MXNetR, H2O, deepnet, and more—to implement the projects. By the end of this book, you will have a better understanding of deep learning concepts and techniques and how to use them in a practical setting.

BookFeb 2018258 pages

Raspberry Pi Computer Vision Programming

You will learn the basics of hardware and software required for image processing and computer vision with Raspberry Pi and Python 3. You will have a look at all the major image processing, manipulation, and computer vision techniques and algorithms in detail using engaging examples. You will build a lot of real-life computer vision applications.

BookJun 2020306 pages5

Ensemble Machine Learning Cookbook

This book uses a recipe-based approach to showcase the power of machine learning algorithms to build ensemble models using Python libraries. Through this book, you will be able to pick up the code, understand in depth how it works, execute and implement it efficiently. This will be a desk reference to implement a wide range of tasks and solve the common and uncommon problems in ensemble machine learning domain.

BookJan 2019336 pages

Hands-On Image Processing with Python

This book covers how to use the image processing libraries in Python. It will enable you to write code snippets to implement complex image processing algorithms such as image enhancement, filtering, segmentation, object detection, and more. You will also be able to use machine learning and deep learning models and learn to implement them with ease.

BookNov 2018492 pages

OpenCV 3 Computer Vision with Python Cookbook

OpenCV 3 is a native cross-platform library for computer vision, machine learning, and image processing. OpenCV's convenient high-level APIs hide very powerful internals designed for computational efficiency that can take advantage of multicore and GPU processing. This book will help you tackle increasingly challenging computer vision problems by providing a number of recipes that you can use to improve your applications.

BookMar 2018306 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages