Chapter 1: OpenCV Basics and Camera Calibration
This chapter is an introduction to OpenCV and how to use it in the initial phases of a self-driving car pipeline, to ingest a video stream, and prepare it for the next phases. We will discuss the characteristics of a camera from the point of view of a self-driving car and how to improve the quality of what we get out of it. We will also study how to manipulate the videos and we will try one of the most famous features of OpenCV, object detection, which we will use to detect pedestrians.
With this chapter, you will build a solid foundation on how to use OpenCV and NumPy, which will be very useful later.
In this chapter, we will cover the following topics:
- OpenCV and NumPy basics
- Reading, manipulating, and saving images
- Reading, manipulating, and saving videos
- Manipulating images
- How to detect pedestrians with HOG
- Characteristics of a camera
- How to perform the camera calibration
For the instructions and code in this chapter, you need the following:
- Python 3.7
- The opencv-Python module
- The NumPy module
The code for the chapter can be found here:
The Code in Action videos for this chapter can be found here:
Introduction to OpenCV and NumPy
OpenCV is a computer vision and machine learning library that has been developed for more than 20 years and provides an impressive number of functionalities. Despite some inconsistencies in the API, its simplicity and the remarkable number of algorithms implemented make it an extremely popular library and an excellent choice for many situations.
OpenCV is written in C++, but there are bindings for Python, Java, and Android.
In this book, we will focus on OpenCV for Python, with all the code tested using OpenCV 4.2.
OpenCV in Python is provided by
opencv-python, which can be installed using the following command:
pip install opencv-python
OpenCV can take advantage of hardware acceleration, but to get the best performance, you might need to build it from the source code, with different flags than the default, to optimize it for your target hardware.
OpenCV and NumPy
Working with image files
import cv2 image = cv2.imread('test.jpg')
To show the image, you can use
imshow(), which accepts two parameters:
- The name to write on the caption of the window that will show the image
- The image to be shown
Unfortunately, its behavior is counterintuitive, as it will not show an image unless it is followed by a call to
The call to
imshow() will have two effects:
- It will actually allow OpenCV to show the image provided to
- It will wait for the specified amount of milliseconds, or until a key is pressed if the amount of milliseconds passed is
<=0. It will wait indefinitely.
- The name of the file
- The image
- An optional format-dependent parameter: ...
Working with video files
To open a video in OpenCV, you need to call the
cap = cv2.VideoCapture("video.mp4")
After that, you can call
read(), typically in a loop, to retrieve a single frame. The method returns a tuple with two values:
- A Boolean value that is false when the video is finished
- The next frame:
ret, frame = cap.read()
To save a video, there is the
VideoWriter object; its constructor accepts four parameters:
- The filename
- A FOURCC (four-character code) of the video code
- The number of frames per second
- The resolution
Take the following example:
mp4 = cv2.VideoWriter_fourcc(*'MP4V')writer = cv2.VideoWriter('video-out.mp4', mp4, 15, (640, 480))
VideoWriter has been created, the
write() method can be used to add a frame...
As part of a computer vision pipeline for a self-driving car, with or without deep learning, you might need to process the video stream to make other algorithms work better as part of a preprocessing step.
This section will provide you with a solid foundation to preprocess any video stream.
Flipping an image
- The image
- A number that can be 1 (horizontal flip), 0 (vertical flip), or -1 (both horizontal and vertical flip)
Let's see a sample code:
flipH = cv2.flip(img, 1)flipV = cv2.flip(img, 0)flip = cv2.flip(img, -1)
This will produce the following result:
Blurring an image...
Pedestrian detection using HOG
The Histogram of Oriented Gradients (HOG) is an object detection technique implemented by OpenCV. In simple cases, it can be used to see whether there is a certain object present in the image, where it is, and how big it is.
OpenCV includes a detector trained for pedestrians, and you are going to use it. It might not be enough for a real-life situation, but it is useful to learn how to use it. You could also train another one with more images to see whether it performs better. Later in the book, you will see how to use deep learning to detect not only pedestrians but also cars and traffic lights.
At the core of the HOG detector, there is a mechanism able to tell whether a given 48x96 image is a pedestrian. As this is not terribly...
Camera calibration with OpenCV
Remember the lens distortion we talked about in the previous section? You need to correct this to ensure you accurately locate where objects are relative to your vehicle. It does you no good to see an object if you don't know whether it is in front of you or next to you. Even good lenses can distort the image, and this is particularly true for wide-angle lenses. Luckily, OpenCV provides a mechanism to detect this distortion and correct it!
The idea is to take pictures of a chessboard, so OpenCV can use this high-contrast pattern to detect the position of the points and compute the distortion based on the difference between the expected image and the recorded one.
You need to provide several pictures at different orientations. It might take some experiments to find a good set of pictures, but 10 to 20 images should...
Well, you have had a great start to your computer vision journey toward making a real self-driving car.
You learned about a very useful toolset called OpenCV with bindings for Python and NumPy. With these tools, you are now able to create and import images using methods such as
vconcat(). You learned how to import and create video files, as well as capturing video from a webcam with methods such as
VideoWriter(). Watch out Spielberg, there is a new movie-maker in town!
It was wonderful to be able to import images, but how do you start manipulating them to help your computer vision algorithms learn what features matter? You learned how to do this through methods such as
convertScaleAbs(). Then, you learned how to annotate images for human consumption with methods such as
Then came the real magic, where you learned how...
- Can OpenCV take advantage of hardware acceleration?
- What's the best blurring method if CPU power is not a problem?
- Which detector can be used to find pedestrians in an image?
- How can you read the video stream from a webcam?
- What is the trade-off between aperture and depth of field?
- When do you need a high ISO?
- Is it worth computing sub-pixel precision for camera calibration?