Reader small image

You're reading from  OpenCV with Python Blueprints

Product typeBook
Published inOct 2015
Reading LevelIntermediate
PublisherPackt
ISBN-139781785282690
Edition1st Edition
Languages
Right arrow
Authors (2):
Michael Beyeler
Michael Beyeler
author image
Michael Beyeler

Michael Beyeler is a postdoctoral fellow in neuroengineering and data science at the University of Washington, where he is working on computational models of bionic vision in order to improve the perceptual experience of blind patients implanted with a retinal prosthesis (bionic eye).His work lies at the intersection of neuroscience, computer engineering, computer vision, and machine learning. He is also an active contributor to several open source software projects, and has professional programming experience in Python, C/C++, CUDA, MATLAB, and Android. Michael received a PhD in computer science from the University of California, Irvine, and an MSc in biomedical engineering and a BSc in electrical engineering from ETH Zurich, Switzerland.
Read more about Michael Beyeler

Michael Beyeler (USD)
Michael Beyeler (USD)
author image
Michael Beyeler (USD)

Michael Beyeler is a postdoctoral fellow in neuroengineering and data science at the University of Washington, where he is working on computational models of bionic vision in order to improve the perceptual experience of blind patients implanted with a retinal prosthesis (bionic eye).His work lies at the intersection of neuroscience, computer engineering, computer vision, and machine learning. He is also an active contributor to several open source software projects, and has professional programming experience in Python, C/C++, CUDA, MATLAB, and Android. Michael received a PhD in computer science from the University of California, Irvine, and an MSc in biomedical engineering and a BSc in electrical engineering from ETH Zurich, Switzerland.
Read more about Michael Beyeler (USD)

View More author details
Right arrow

Chapter 4. 3D Scene Reconstruction Using Structure from Motion

The goal of this chapter is to study how to reconstruct a scene in 3D by inferring the geometrical features of the scene from camera motion. This technique is sometimes referred to as structure from motion. By looking at the same scene from different angles, we will be able to infer the real-world 3D coordinates of different features in the scene. This process is known as triangulation, which allows us to reconstruct the scene as a 3D point cloud.

In the previous chapter, you learned how to detect and track an object of interest in the video stream of a webcam, even if the object is viewed from different angles or distances, or under partial occlusion. Here, we will take the tracking of interesting features a step further and consider what we can learn about the entire visual scene by studying similarities between image frames. If we take two pictures of the same scene from different angles, we can use feature matching or...

Planning the app


The final app will extract and visualize structure from motion on a pair of images. We will assume that these two images have been taken with the same camera, whose internal camera parameters we know. If these parameters are not known, they need to be estimated first in a camera calibration process.

The final app will then consist of the following modules and scripts:

  • chapter4.main: This is the main function routine for starting the application.

  • scene3D.SceneReconstruction3D: This is a class that contains a range of functionalities for calculating and visualizing structure from motion. It includes the following public methods:

    • __init__: This constructor will accept the intrinsic camera matrix and the distortion coefficients

    • load_image_pair: A method used to load from the file, two images that have been taken with the camera described earlier

    • plot_optic_flow: A method used to visualize the optic flow between the two image frames

    • draw_epipolar_lines: A method used to draw the...

Camera calibration


So far, we have worked with whatever image came straight out of our webcam, without questioning the way in which it was taken. However, every camera lens has unique parameters, such as focal length, principal point, and lens distortion. What happens behind the covers when a camera takes a picture, is that; light falls through a lens, followed by an aperture, before falling on the surface of a light sensor. This process can be approximated with the pinhole camera model. The process of estimating the parameters of a real-world lens such that it would fit the pinhole camera model is called camera calibration (or camera resectioning, and it should not be confused with photometric camera calibration).

The pinhole camera model

The pinhole camera model is a simplification of a real camera in which there is no lens and the camera aperture is approximated by a single point (the pinhole). When viewing a real-world 3D scene (such as a tree), light rays pass through the point-sized...

Setting up the app


Going forward, we will be using a famous open source dataset called fountain-P11. It depicts a Swiss fountain viewed from various angles. An example of this is shown in the following image:

The dataset consists of 11 high-resolution images and can be downloaded from http://cvlabwww.epfl.ch/data/multiview/denseMVS.html. Had we taken the pictures ourselves, we would have had to go through the entire camera calibration procedure to recover the intrinsic camera matrix and the distortion coefficients. Luckily, these parameters are known for the camera that took the fountain dataset, so we can go ahead and hardcode these values in our code.

The main function routine

Our main function routine will consist of creating and interacting with an instance of the SceneReconstruction3D class. This code can be found in the chapter4.py file, which imports all the necessary modules and instantiates the class:

import numpy as np

from scene3D import SceneReconstruction3D


def main():
    #...

Estimating the camera motion from a pair of images


Now that we have loaded two images (self.img1 and self.img2) of the same scene, such as two examples from the fountain dataset, we find ourselves in a similar situation as in the last chapter. We are given two images that supposedly show the same rigid object or static scene, but from different viewpoints. However, this time we want to go a step further; if the only thing that changes between taking the two pictures is the location of the camera, can we infer the relative camera motion by looking at the matching features?

Well, of course we can. Otherwise, this chapter would not make much sense, would it? We will take the location and orientation of the camera in the first image as a given and then find out how much we have to reorient and relocate the camera so that its viewpoint matches that from the second image.

In other words, we need to recover the essential matrix of the camera in the second image. An essential matrix is a 4 x 3 matrix...

Reconstructing the scene


Finally, we can reconstruct the 3D scene by making use of a process called triangulation. We are able to infer the 3D coordinates of a point because of the way epipolar geometry works. By calculating the essential matrix, we get to know more about the geometry of the visual scene than we might think. Because the two cameras depict the same real-world scene, we know that most of the 3D real-world points will be found in both images. Moreover, we know that the mapping from the 2D image points to the corresponding 3D real-world points, will follow the rules of geometry. If we study a sufficiently large number of image points, we can construct, and solve, a (large) system of linear equations to get the ground truth of the real-world coordinates.

Let's return to the Swiss fountain dataset. If we ask two photographers to take a picture of the fountain from different viewpoints at the same time, it is not hard to realize that the first photographer might show up in the...

3D point cloud visualization


The last step is visualizing the triangulated 3D real-world points. An easy way of creating 3D scatterplots is by using matplotlib. However, if you are looking for more professional visualization tools, you may be interested in Mayavi (http://docs.enthought.com/mayavi/mayavi), VisPy (http://vispy.org), or the Point Cloud Library (http://pointclouds.org). Although the latter does not have Python support for point cloud visualization yet, it is an excellent tool for point cloud segmentation, filtering, and sample consensus model fitting. For more information, head over to strawlab's GitHub repository at https://github.com/strawlab/python-pcl.

Before we can plot our 3D point cloud, we obviously have to extract the [R | t] matrix and perform the triangulation as explained earlier:

def plot_point_cloud(self, feat_mode="SURF"):
    self._extract_keypoints(feat_mode)
    self._find_fundamental_matrix()
    self._find_essential_matrix()
    self._find_camera_matrices_rt...

Summary


In this chapter, we explored a way of reconstructing a scene in 3D—by inferring the geometrical features of 2D images taken by the same camera. We wrote a script to calibrate a camera, and you learned about fundamental and essential matrices. We used this knowledge to perform triangulation. We then went on to visualize the real-world geometry of the scene in a 3D point cloud. Using simple 3D scatterplots in matplotlib, we found a way to convince ourselves that our calculations were accurate and practical.

Going forward from here, it will be possible to store the triangulated 3D points in a file that can be parsed by the Point Cloud Library, or to repeat the procedure for different image pairs so that we can generate a denser and more accurate reconstruction. Although we have covered a lot in this chapter, there is a lot more left to do. Typically, when talking about a structure-from-motion pipeline, we include two additional steps that we have not talked about so far: bundle adjustment...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
OpenCV with Python Blueprints
Published in: Oct 2015Publisher: PacktISBN-13: 9781785282690
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Michael Beyeler

Michael Beyeler is a postdoctoral fellow in neuroengineering and data science at the University of Washington, where he is working on computational models of bionic vision in order to improve the perceptual experience of blind patients implanted with a retinal prosthesis (bionic eye).His work lies at the intersection of neuroscience, computer engineering, computer vision, and machine learning. He is also an active contributor to several open source software projects, and has professional programming experience in Python, C/C++, CUDA, MATLAB, and Android. Michael received a PhD in computer science from the University of California, Irvine, and an MSc in biomedical engineering and a BSc in electrical engineering from ETH Zurich, Switzerland.
Read more about Michael Beyeler

author image
Michael Beyeler (USD)

Michael Beyeler is a postdoctoral fellow in neuroengineering and data science at the University of Washington, where he is working on computational models of bionic vision in order to improve the perceptual experience of blind patients implanted with a retinal prosthesis (bionic eye).His work lies at the intersection of neuroscience, computer engineering, computer vision, and machine learning. He is also an active contributor to several open source software projects, and has professional programming experience in Python, C/C++, CUDA, MATLAB, and Android. Michael received a PhD in computer science from the University of California, Irvine, and an MSc in biomedical engineering and a BSc in electrical engineering from ETH Zurich, Switzerland.
Read more about Michael Beyeler (USD)