Hands-On Vision and Behavior for Self-Driving Cars

By Luca Venturi , KRISHTOF KORDA
    What do you get with a Packt Subscription?

  • Instant access to this title and 7,500+ eBooks & Videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Free Chapter
    Chapter 1: OpenCV Basics and Camera Calibration
About this book

The visual perception capabilities of a self-driving car are powered by computer vision. The work relating to self-driving cars can be broadly classified into three components - robotics, computer vision, and machine learning. This book provides existing computer vision engineers and developers with the unique opportunity to be associated with this booming field.

You will learn about computer vision, deep learning, and depth perception applied to driverless cars. The book provides a structured and thorough introduction, as making a real self-driving car is a huge cross-functional effort. As you progress, you will cover relevant cases with working code, before going on to understand how to use OpenCV, TensorFlow and Keras to analyze video streaming from car cameras. Later, you will learn how to interpret and make the most of lidars (light detection and ranging) to identify obstacles and localize your position. You’ll even be able to tackle core challenges in self-driving cars such as finding lanes, detecting pedestrian and crossing lights, performing semantic segmentation, and writing a PID controller.

By the end of this book, you’ll be equipped with the skills you need to write code for a self-driving car running in a driverless car simulator, and be able to tackle various challenges faced by autonomous car engineers.

Publication date:
October 2020


Chapter 1: OpenCV Basics and Camera Calibration

This chapter is an introduction to OpenCV and how to use it in the initial phases of a self-driving car pipeline, to ingest a video stream, and prepare it for the next phases. We will discuss the characteristics of a camera from the point of view of a self-driving car and how to improve the quality of what we get out of it. We will also study how to manipulate the videos and we will try one of the most famous features of OpenCV, object detection, which we will use to detect pedestrians.

With this chapter, you will build a solid foundation on how to use OpenCV and NumPy, which will be very useful later.

In this chapter, we will cover the following topics:

  • OpenCV and NumPy basics
  • Reading, manipulating, and saving images
  • Reading, manipulating, and saving videos
  • Manipulating images
  • How to detect pedestrians with HOG
  • Characteristics of a camera
  • How to perform the camera calibration

Technical requirements

For the instructions and code in this chapter, you need the following:

  • Python 3.7
  • The opencv-Python module
  • The NumPy module

The code for the chapter can be found here:


The Code in Action videos for this chapter can be found here:



Introduction to OpenCV and NumPy

OpenCV is a computer vision and machine learning library that has been developed for more than 20 years and provides an impressive number of functionalities. Despite some inconsistencies in the API, its simplicity and the remarkable number of algorithms implemented make it an extremely popular library and an excellent choice for many situations.

OpenCV is written in C++, but there are bindings for Python, Java, and Android.

In this book, we will focus on OpenCV for Python, with all the code tested using OpenCV 4.2.

OpenCV in Python is provided by opencv-python, which can be installed using the following command:

pip install opencv-python

OpenCV can take advantage of hardware acceleration, but to get the best performance, you might need to build it from the source code, with different flags than the default, to optimize it for your target hardware.

OpenCV and NumPy

The Python bindings use NumPy, which increases the flexibility and...


Working with image files

OpenCV provides a very simple way to load images, using imread():

import cv2
image = cv2.imread('test.jpg')

To show the image, you can use imshow(), which accepts two parameters:

  • The name to write on the caption of the window that will show the image
  • The image to be shown

Unfortunately, its behavior is counterintuitive, as it will not show an image unless it is followed by a call to waitKey():

cv2.imshow("Image", image)cv2.waitKey(0)

The call to waitKey() after imshow() will have two effects:

  • It will actually allow OpenCV to show the image provided to imshow().
  • It will wait for the specified amount of milliseconds, or until a key is pressed if the amount of milliseconds passed is <=0. It will wait indefinitely.

An image can be saved on disk using the imwrite() method, which accepts three parameters:

  • The name of the file
  • The image
  • An optional format-dependent parameter:
  • ...

Working with video files

Using videos in OpenCV is very simple; in fact, every frame is an image and can be manipulated with the methods that we have already analyzed.

To open a video in OpenCV, you need to call the VideoCapture() method:

cap = cv2.VideoCapture("video.mp4")

After that, you can call read(), typically in a loop, to retrieve a single frame. The method returns a tuple with two values:

  • A Boolean value that is false when the video is finished
  • The next frame:
ret, frame = cap.read()

To save a video, there is the VideoWriter object; its constructor accepts four parameters:

  • The filename
  • A FOURCC (four-character code) of the video code
  • The number of frames per second
  • The resolution

Take the following example:

mp4 = cv2.VideoWriter_fourcc(*'MP4V')writer = cv2.VideoWriter('video-out.mp4', mp4, 15, (640, 480))

Once VideoWriter has been created, the write() method can be used to add a frame...


Manipulating images

As part of a computer vision pipeline for a self-driving car, with or without deep learning, you might need to process the video stream to make other algorithms work better as part of a preprocessing step.

This section will provide you with a solid foundation to preprocess any video stream.

Flipping an image

OpenCV provides the flip() method to flip an image, and it accepts two parameters:

  • The image
  • A number that can be 1 (horizontal flip), 0 (vertical flip), or -1 (both horizontal and vertical flip)

Let's see a sample code:

flipH = cv2.flip(img, 1)flipV = cv2.flip(img, 0)flip = cv2.flip(img, -1)

This will produce the following result:

Figure 1.4 – Original image, horizontally flipped, vertically flipped, and both

Figure 1.4 – Original image, horizontally flipped, vertically flipped, and both

As you can see, the first image is our original image, which was flipped horizontally and vertically, and then both, horizontally and vertically together.

Blurring an image


Pedestrian detection using HOG

The Histogram of Oriented Gradients (HOG) is an object detection technique implemented by OpenCV. In simple cases, it can be used to see whether there is a certain object present in the image, where it is, and how big it is.

OpenCV includes a detector trained for pedestrians, and you are going to use it. It might not be enough for a real-life situation, but it is useful to learn how to use it. You could also train another one with more images to see whether it performs better. Later in the book, you will see how to use deep learning to detect not only pedestrians but also cars and traffic lights.

Sliding window

The HOG pedestrian detector in OpenCV is trained with a model that is 48x96 pixels, and therefore it is not able to detect objects smaller than that (or, better, it could, but the box will be 48x96).

At the core of the HOG detector, there is a mechanism able to tell whether a given 48x96 image is a pedestrian. As this is not terribly...


Camera calibration with OpenCV

In this section, you will learn how to take objects with a known pattern and use them to correct lens distortion using OpenCV.

Remember the lens distortion we talked about in the previous section? You need to correct this to ensure you accurately locate where objects are relative to your vehicle. It does you no good to see an object if you don't know whether it is in front of you or next to you. Even good lenses can distort the image, and this is particularly true for wide-angle lenses. Luckily, OpenCV provides a mechanism to detect this distortion and correct it!

The idea is to take pictures of a chessboard, so OpenCV can use this high-contrast pattern to detect the position of the points and compute the distortion based on the difference between the expected image and the recorded one.

You need to provide several pictures at different orientations. It might take some experiments to find a good set of pictures, but 10 to 20 images should...



Well, you have had a great start to your computer vision journey toward making a real self-driving car.

You learned about a very useful toolset called OpenCV with bindings for Python and NumPy. With these tools, you are now able to create and import images using methods such as imread(), imshow(), hconcat(), and vconcat(). You learned how to import and create video files, as well as capturing video from a webcam with methods such as VideoCapture() and VideoWriter(). Watch out Spielberg, there is a new movie-maker in town!

It was wonderful to be able to import images, but how do you start manipulating them to help your computer vision algorithms learn what features matter? You learned how to do this through methods such as flip(), blur(), GaussianBlur(), medianBlur(), bilateralFilter(), and convertScaleAbs(). Then, you learned how to annotate images for human consumption with methods such as rectangle() and putText().

Then came the real magic, where you learned how...



  1. Can OpenCV take advantage of hardware acceleration?
  2. What's the best blurring method if CPU power is not a problem?
  3. Which detector can be used to find pedestrians in an image?
  4. How can you read the video stream from a webcam?
  5. What is the trade-off between aperture and depth of field?
  6. When do you need a high ISO?
  7. Is it worth computing sub-pixel precision for camera calibration?
About the Authors
  • Luca Venturi

    Luca Venturi has extensive experience as a programmer with world-class companies, including Ferrari and Opera Software. He has also worked for some start-ups, including Activetainment (maker of the world's first smart bike), Futurehome (a provider of smart home solutions), and CompanyBook (whose offerings apply artificial intelligence to sales). He worked on the Data Platform team at Tapad (Telenor Group), making petabytes of data accessible to the rest of the company, and is now the lead engineer of Piano Software's analytical database.

    Browse publications by this author

    Krishtof Korda grew up in a mountainside home over which the US Navy's Blue Angels flew during the Reno Air Races each year. A graduate from the University of Southern California and the USMC Officer Candidate School, he set the Marine Corps obstacle course record of 51 seconds. He took his love of aviation to the USAF, flying aboard the C-5M Super Galaxy as a flight test engineer for 5 years, and engineered installations of airborne experiments for the USAF Test Pilot School for 4 years. Later, he transitioned to designing sensor integrations for autonomous cars at Lyft Level 5. Now he works as an applications engineer for Ouster, integrating LIDAR sensors in the fields of robotics, AVs, drones, and mining, and loves racing Enduro mountain bikes.

    Browse publications by this author
Latest Reviews (1 reviews total)
Decent book..............
Hands-On Vision and Behavior for Self-Driving Cars
Unlock this book and the full library FREE for 7 days
Start now