Read more about this book |

*(For more resources related to the article, see here.)*

# Introduction

Images are generally produced using a digital camera that captures a scene by projecting light onto an image sensor going through its lens. The fact that an image is formed through the projection of a 3D scene onto a 2D plane imposes the existence of important relations between a scene and its image, and between different images of the same scene. Projective geometry is the tool that is used to describe and characterize, in mathematical terms, the process of image formation. In this article, you will learn some of the fundamental projective relations that exist in multi-view imagery and how these can be used in computer vision programming. But before we start the recipes, let's explore the basic concepts related to scene projection and image formation.

## Image formation

Fundamentally, the process used to produce images has not changed since the beginning of photography. The light coming from an observed scene is captured by a camera through a frontal **aperture** and the captured light rays hit an **image plane** (or **image sensor**) located on the back of the camera. Additionally, a lens is used to concentrate the rays coming from the different scene elements. This process is illustrated by the following figure:

Here, **do** is the distance from the lens to the observed object, **di** is the distance from the lens to the image plane, and **f** is the **focal length** of the lens. These quantities are related by the so-called **thin lens equation**:

In computer vision, this camera model can be simplified in a number of ways. First, we can neglect the effect of the lens by considering a camera with an infinitesimal aperture since, in theory, this does not change the image. Only the central ray is therefore considered. Second, since most of the time we have *do>>di*, we can assume that the image plane is located at the focal distance. Finally, we can notice from the geometry of the system, that the image on the plane is inverted. We can obtain an identical but upright image by simply positioning the image plane in front of the lens. Obviously, this is not physically feasible, but from a mathematical point of view, this is completely equivalent. This simplified model is often referred to as the **pin-hole** camera model and it is represented as follows:

From this model, and using the law of similar triangles, we can easily derive the basic projective equation:

The size (**hi**) of the image of an object (of height **ho**) is therefore inversely proportional to its distance (**do**) from the camera which is naturally true. This relation allows the position of the image of a 3D scene point to be predicted onto the image plane of a camera.

# Calibrating a camera

From the introduction of this article, we learned that the essential parameters of a camera under the pin-hole model are its focal length and the size of the image plane (which defines the field of view of the camera). Also, since we are dealing with digital images, the number of pixels on the image plane is another important characteristic of a camera. Finally, in order to be able to compute the position of an image's scene point in pixel coordinates, we need one additional piece of information. Considering the line coming from the focal point that is orthogonal to the image plane, we need to know at which pixel position this line pierces the image plane. This point is called the **principal point**. It could be logical to assume that this principal point is at the center of the image plane, but in practice, this one might be off by few pixels depending at which precision the camera has been manufactured.

Camera calibration is the process by which the different camera parameters are obtained. One can obviously use the specifications provided by the camera manufacturer, but for some tasks, such as 3D reconstruction, these specifications are not accurate enough. Camera calibration will proceed by showing known patterns to the camera and analyzing the obtained images. An optimization process will then determine the optimal parameter values that explain the observations. This is a complex process but made easy by the availability of OpenCV calibration functions.

## How to do it...

To calibrate a camera, the idea is show to this camera a set of scene points for which their 3D position is known. You must then determine where on the image these points project. Obviously, for accurate results, we need to observe several of these points. One way to achieve this would be to take one picture of a scene with many known 3D points. A more convenient way would be to take several images from different viewpoints of a set of some 3D points. This approach is simpler but requires computing the position of each camera view, in addition to the computation of the internal camera parameters which fortunately is feasible.

OpenCV proposes to use a chessboard pattern to generate the set of 3D scene points required for calibration. This pattern creates points at the corners of each square, and since this pattern is flat, we can freely assume that the board is located at Z=0 with the X and Y axes well aligned with the grid. In this case, the calibration process simply consists of showing the chessboard pattern to the camera from different viewpoints. Here is one example of a calibration pattern image:

The nice thing is that OpenCV has a function that automatically detects the corners of this chessboard pattern. You simply provide an image and the size of the chessboard used (number of vertical and horizontal inner corner points). The function will return the position of these chessboard corners on the image. If the function fails to find the pattern, then it simply returns false:

// output vectors of image points

std::vector<cv::Point2f> imageCorners;

// number of corners on the chessboard

cv::Size boardSize(6,4);

// Get the chessboard corners

bool found = cv::findChessboardCorners(image,

boardSize, imageCorners);

Note that this function accepts additional parameters if one needs to tune the algorithm, which are not discussed here. There is also a function that draws the detected corners on the chessboard image with lines connecting them in sequence:

//Draw the corners

cv::drawChessboardCorners(image,

boardSize, imageCorners,

found); // corners have been found

The image obtained is seen here:

The lines connecting the points shows the order in which the points are listed in the vector of detected points. Now to calibrate the camera, we need to input a set of such image points together with the coordinate of the corresponding 3D points. Let's encapsulate the calibration process in a *CameraCalibrator* class:

class CameraCalibrator {

// input points:

// the points in world coordinates

std::vector<std::vector<cv::Point3f>> objectPoints;

// the point positions in pixels

std::vector<std::vector<cv::Point2f>> imagePoints;

// output Matrices

cv::Mat cameraMatrix;

cv::Mat distCoeffs;

// flag to specify how calibration is done

int flag;

// used in image undistortion

cv::Mat map1,map2;

bool mustInitUndistort;

public:

CameraCalibrator() : flag(0), mustInitUndistort(true) {};

As mentioned previously, the 3D coordinates of the points on the chessboard pattern can be easily determined if we conveniently place the reference frame on the board. The method that accomplishes this takes a vector of the chessboard image filename as input:

// Open chessboard images and extract corner points

int CameraCalibrator::addChessboardPoints(

const std::vector<std::string>& filelist,

cv::Size & boardSize) {

// the points on the chessboard

std::vector<cv::Point2f> imageCorners;

std::vector<cv::Point3f> objectCorners;

// 3D Scene Points:

// Initialize the chessboard corners

// in the chessboard reference frame

// The corners are at 3D location (X,Y,Z)= (i,j,0)

for (int i=0; i<boardSize.height; i++) {

for (int j=0; j<boardSize.width; j++) {

objectCorners.push_back(cv::Point3f(i, j, 0.0f));

}

}

// 2D Image points:

cv::Mat image; // to contain chessboard image

int successes = 0;

// for all viewpoints

for (int i=0; i<filelist.size(); i++) {

// Open the image

image = cv::imread(filelist[i],0);

// Get the chessboard corners

bool found = cv::findChessboardCorners(

image, boardSize, imageCorners);

// Get subpixel accuracy on the corners

cv::cornerSubPix(image, imageCorners,

cv::Size(5,5),

cv::Size(-1,-1),

cv::TermCriteria(cv::TermCriteria::MAX_ITER +

cv::TermCriteria::EPS,

30, // max number of iterations

0.1)); // min accuracy

//If we have a good board, add it to our data

if (imageCorners.size() == boardSize.area()) {

// Add image and scene points from one view

addPoints(imageCorners, objectCorners);

successes++;

}

}

return successes;

}

The first loop inputs the 3D coordinates of the chessboard, which are specified in an arbitrary square size unit here. The corresponding image points are the ones provided by the *cv::findChessboardCorners* function. This is done for all available viewpoints. Moreover, in order to obtain a more accurate image point location, the function *cv::cornerSubPix* can be used and as the name suggests, the image points will then be localized at sub-pixel accuracy. The termination criterion that is specified by the *cv::TermCriteria* object defines a maximum number of iterations and a minimum accuracy in sub-pixel coordinates. The first of these two conditions that is reached will stop the corner refinement process.

When a set of chessboard corners has been successfully detected, these points are added to our vector of image and scene points:

// Add scene points and corresponding image points

void CameraCalibrator::addPoints(const std::vector<cv::Point2f>&

imageCorners, const std::vector<cv::Point3f>& objectCorners) {

// 2D image points from one view

imagePoints.push_back(imageCorners);

// corresponding 3D scene points

objectPoints.push_back(objectCorners);

}

The vectors contains *std::vector* instances. Indeed, each vector element being a vector of points from one view.

Once a sufficient number of chessboard images have been processed (and consequently a large number of 3D scene point/2D image point correspondences are available), we can initiate the computation of the calibration parameters:

// Calibrate the camera

// returns the re-projection error

double CameraCalibrator::calibrate(cv::Size &imageSize)

{

// undistorter must be reinitialized

mustInitUndistort= true;

//Output rotations and translations

std::vector<cv::Mat> rvecs, tvecs;

// start calibration

return

calibrateCamera(objectPoints, // the 3D points

imagePoints, // the image points

imageSize, // image size

cameraMatrix,// output camera matrix

distCoeffs, // output distortion matrix

rvecs, tvecs,// Rs, Ts

flag); // set options

}

In practice, 10 to 20 chessboard images are sufficient, but these must be taken from different viewpoints at different depths. The two important outputs of this function are the camera matrix and the distortion parameters. The camera matrix will be described in the next section. For now, let's consider the distortion parameters. So far, we have mentioned that with the pin-hole camera model, we can neglect the effect of the lens. But this is only possible if the lens used to capture an image does not introduce too important optical distortions. Unfortunately, this is often the case with lenses of lower quality or with lenses having a very short focal length. You may have already noticed that in the image we used for our example, the chessboard pattern shown is clearly distorted. The edges of the rectangular board being curved in the image. It can also be noticed that this distortion becomes more important as we move far from the center of the image. This is a typical distortion observed with fish-eye lens and it is called **radial distortion**. The lenses that are used in common digital cameras do not exhibit such a high degree of distortion, but in the case of the lens used here, these distortions cannot certainly be ignored.

It is possible to compensate for these deformations by introducing an appropriate model. The idea is to represent the distortions induced by a lens by a set of mathematical equations. Once established, these equations can then be reverted in order to undo the distortions visible on the image. Fortunately, the exact parameters of the transformation that will correct the distortions can be obtained together with the other camera parameter during the calibration phase. Once this is done, any image from the newly calibrated camera can be undistorted:

// remove distortion in an image (after calibration)

cv::Mat CameraCalibrator::remap(const cv::Mat &image) {

cv::Mat undistorted;

if (mustInitUndistort) { // called once per calibration

cv::initUndistortRectifyMap(

cameraMatrix, // computed camera matrix

distCoeffs, // computed distortion matrix

cv::Mat(), // optional rectification (none)

cv::Mat(), // camera matrix to generate undistorted

image.size(), // size of undistorted

CV_32FC1, // type of output map

map1, map2); // the x and y mapping functions

mustInitUndistort= false;

}

// Apply mapping functions

cv::remap(image, undistorted, map1, map2,

cv::INTER_LINEAR); // interpolation type

return undistorted;

}

Which results in the following image:

As you can see, once the image is undistorted, we obtain a regular perspective image.

## How it works...

In order to explain the result of the calibration, we need to go back to the figure in the introduction which describes the pin-hole camera model. More specifically, we want to demonstrate the relation between a point in 3D at position (X,Y,Z) and its image (x,y) on a camera specified in pixel coordinates. Let's redraw this figure by adding a reference frame that we position at the center of the projection as seen here:

Note that the Y-axis is pointing downward to get a coordinate system compatible with the usual convention that places the image origin at the upper-left corner. We learned previously that the point (X,Y,Z) will be projected onto the image plane at (fX/Z,fY/Z). Now, if we want to translate this coordinate into pixels, we need to divide the 2D image position by, respectively, the pixel width (px) and height (py). We notice that by dividing the focal length f given in world units (most often meters or millimeters) by px, then we obtain the focal length expressed in (horizontal) pixels. Let's then define this term as fx. Similarly, fy =f/py is defined as the focal length expressed in vertical pixel unit. The complete projective equation is therefore:

Recall that (u0,v0) is the principal point that is added to the result in order to move the origin to the upper-left corner of the image. These equations can be rewritten in matrix form through the introduction of **homogeneous coordinates** in which 2D points are represented by 3-vectors, and 3D points represented by 4-vectors (the extra coordinate is simply an arbitrary scale factor that need to be removed when a 2D coordinate needs to be extracted from a homogeneous 3-vector). Here is the projective equation rewritten:

The second matrix is a simple projection matrix. The first matrix includes all of the camera parameters which are called the **intrinsic parameters** of the camera. This 3x3 matrix is one of the output matrices returned by the *cv::calibrateCamera* function. There is also a function called *cv::calibrationMatrixValues* that returns the value of the intrinsic parameters given a calibration matrix.

More generally, when the reference frame is not at the projection center of the camera, we will need to add a rotation (a 3x3 matrix) and a translation vector (*3x1* matrix). These two matrices describe the rigid transformation that must be applied to the 3D points in order to bring them back to the camera reference frame. Therefore, we can rewrite the projection equation in its most general form:

Remember that in our calibration example, the reference frame was placed on the chessboard. Therefore, there is a rigid transformation (rotation and translation) that must be computed for each view. These are in the output parameter list of the *cv::calibrateCamera* function. The rotation and translation components are often called the **extrinsic parameters** of the calibration and they are different for each view. The intrinsic parameters remain constant for a given camera/lens system. The intrinsic parameters of our test camera obtained from a calibration based on 20 chessboard images are fx=167, fy=178, u0=156, v0=119. These results are obtained by *cv::calibrateCamera* through an optimization process aimed at finding the intrinsic and extrinsic parameters that will minimize the difference between the predicted image point position, as computed from the projection of the 3D scene points, and the actual image point position, as observed on the image. The sum of this difference for all points specified during the calibration is called the **re-projection error**.

To correct the distortion, OpenCV uses a polynomial function that is applied to the image point in order to move them at their undistorted position. By default, 5 coefficients are used; a model made of 8 coefficients is also available. Once these coefficients are obtained, it is possible to compute 2 mapping functions (one for the x coordinate and one for the y) that will give the new undistorted position of an image point on a distorted image. This is computed by the function *cv::initUndistortRectifyMap* and the function *cv::remap* remaps all of the points of an input image to a new image. Note that because of the non-linear transformation, some pixels of the input image now fall outside the boundary of the output image. You can expand the size of the output image to compensate for this loss of pixels, but you will now obtain output pixels that have no values in the input image (they will then be displayed as black pixels).

## There's more...

When a good estimate of the camera intrinsic parameters are known, it could be advantageous to input them to the *cv::calibrateCamera* function. They will then be used as initial values in the optimization process. To do so, you just need to add the flag *CV_CALIB_USE_INTRINSIC_GUESS* and input these values in the calibration matrix parameter. It is also possible to impose a fixed value for the principal point (*CV_CALIB_ FIX_PRINCIPAL_POINT*), which can often be assumed to be the central pixel. You can also impose a fixed ratio for the focal lengths fx and fy (*CV_CALIB_FIX_RATIO*) in which case you assume pixels of square shape.

Read more about this book |

*(For more resources related to the article, see here.)*

# Computing the fundamental matrix of an image pair

The previous recipe showed you how to recover the projective equation of a single camera. In this recipe, we will explore the projective relation that exists between two images viewing the same scene. These two images could have been obtained by moving a camera at two different locations taking pictures from two viewpoints, or by using two cameras, each of them taking a different picture of the scene. When these two cameras are separated by a rigid baseline, we use the term **stereovision**.

## Getting ready

Let's now consider two cameras observing a given scene point as seen here:

We learned that we can find the image x of a 3D point X by tracing a line joining this 3D point with the camera's center. Conversely, the image point that we observe at position x can be located anywhere on this line in 3D space. This implies that if we want to find the corresponding point of a given image point in another image, we need to search along the projection of this line onto the second image plane. This imaginary line is called the **epipolar line** of point x. It defines a fundamental constraint that must satisfy two corresponding points, that is, the match of a given point must lie on the epipolar line of this point in the other view. The exact orientation of this epipolar line depends on the respective position of the two cameras. In fact, the position of the epipolar lines characterizes the geometry of a two-view system.

Another observation that can be made from the geometry of this two-view system is that all of the epipolar lines pass through the same point. This point corresponds to the projection of one camera center onto the other camera. This special point is called an **epipole**.

Mathematically, it can be shown that the relation between an image point and its corresponding epipolar line can be expressed using a 3x3 matrix such as the following:

In projective geometry, a 2D line is also represented by a 3-vector. It corresponds to the set of 2D points (x',y') satisfying the equation l1'x'+ l2'y'+ l3'=0 (the prime superscript denotes that this line belongs to the second image). Consequently, the matrix F, called the **fundamental matrix**, maps a 2D image point in one view to an epipolar line in the other view.

## How to do it...

Estimating the fundamental matrix of an image pair can be done by solving a set of equations which involve a certain number of known matched points between the two images. The minimum number of such matches is seven. Using an image pair, we can manually select seven good matches (seen in the following screenshot). These will be used to compute the fundamental matrix using the *cv::findFundementalMat* OpenCV function.

(Move the mouse over the image to enlarge it.)

If we have the image points in each image as *cv::keypoint* instance, they first need to be converted into *cv::Point2f* in order to be used with *cv::findFundementalMat*. An OpenCV function can be used to this end:

// Convert keypoints into Point2f

std::vector<cv::Point2f> selPoints1, selPoints2;

cv::KeyPoint::convert(keypoints1,selPoints1,pointIndexes1);

cv::KeyPoint::convert(keypoints2,selPoints2,pointIndexes2);

The two vectors *selPoints1* and *selPoints2* contain the corresponding points in the two images. *keypoints1* and *keypoints2* are the selected *Keypoint* instances. The call to the *cv::findFundementalMat* function is then as follows:

// Compute F matrix from 7 matches

cv::Mat fundemental= cv::findFundamentalMat(

cv::Mat(selPoints1), // points in first image

cv::Mat(selPoints2), // points in second image

CV_FM_7POINT); // 7-point method

One way to visually verify the validity of the fundamental matrix is to draw the epipolar lines of some selected points. Another OpenCV function allows the epipolar lines of a given set of points to be computed. Once these are computed, they can be drawn using the *cv::line* function. The following lines of code accomplish these two steps (that is, computing and drawing epipolar lines in the right image from the points in the left):

// draw the left points corresponding epipolar

// lines in right image

std::vector<cv::Vec3f> lines1;

cv::computeCorrespondEpilines(

cv::Mat(selPoints1), // image points

1, // in image 1 (can also be 2)

fundemental, // F matrix

lines1); // vector of epipolar lines

// for all epipolar lines

for (vector<cv::Vec3f>::const_iterator it= lines1.begin();

it!=lines1.end(); ++it) {

// draw the line between first and last column

cv::line(image2,

cv::Point(0,-(*it)[2]/(*it)[1]),

cv::Point(image2.cols,-((*it)[2]+

(*it)[0]*image2.cols)/(*it)[1]),

cv::Scalar(255,255,255));

}

The result is then seen in the following screenshot:

Remember that the epipole is at the intersection point of all epipolar lines, and it is the projection of the other camera center. This epipole is visible on the preceding image. Often, the epipolar lines intersect outside the image boundaries. It is at the location where the first camera would be visible if the two images were taken at the same instant. Observe the image pair and take the time to convince yourself that this indeed makes sense.

## How it works...

We explained previously that the fundamental matrix gives, for a point in one image, the equation of the line on which its corresponding point in the other view should be found. If the corresponding point of a point p (expressed in homogenous coordinates) is p', and if F is the fundamental matrix between the two views, then since p' lies on the epipolar line Fp, we have:

This equation expresses the relation between two corresponding points and is known as the **epipolar constraint**. Using this equation it becomes possible to estimate the entries of the matrix using known matches. Since the entries of the F matrix are given up to a scale factor, there are only eight entries to estimate (the ninth can be arbitrarily set to 1). Each match contributes one equation. Therefore, with eight known matches, the matrix can be fully estimated by solving the resulting set of linear equations. This is what is done when you use the *CV_FM_8POINT* flag with the *cv::findFundamentalMat* function. Note that, in this case, it is possible (and preferable) to input more than eight matches. The obtained over-determined system of linear equations can then be solved in a mean-square sense.

To estimate the fundamental matrix, an additional constraint can also be exploited. Mathematically, the F matrix maps a 2D point into a 1D pencil of lines (that is, lines intersecting at a common point). The fact that all of these epipolar lines pass by this unique point (the epipole) imposes a constraint on the matrix. This constraint reduces, the number of matches required to estimate the fundamental matrix to seven. Unfortunately, in this case, the set of equations becomes non-linear with up to three possible solutions. The seven-match solution of the F matrix estimation can be invoked in OpenCV by using the *CV_FM_7POINT* flag. This is what we did in the example of the preceding section.

Lastly, we should mention that the choice of an appropriate set of matches in the image is important to obtain an accurate estimation of the fundamental matrix. In general, the matches should be well distributed across the image and include points at different depth in the scene. Otherwise, the solution will become unstable or degenerate configurations can result.

# Matching images using random sample consensus

When two cameras observe the same scene, they see the same elements but under different viewpoints. In this recipe, we take a look at the of image matching problem and we will learn how to exploit the epipolar constraint between two views to match image features more reliably.

The principle we will follow is simple: when we match feature points between two images, we only accept those matches that fall onto the corresponding epipolar lines. However, to be able to check this condition, the fundamental matrix must be known, and we need good matches to estimate this matrix. This seems to be a chicken-and-egg problem. We propose in this recipe a solution in which the fundamental matrix and a set of good matches will be jointly computed.

## How to do it...

The objective is to be able to obtain a set of good matches between two views. Therefore, all found feature point correspondences will be validated using the epipolar constraint introduced in the previous recipe. We first define a class which will encapsulate the different elements of the solution that will be proposed:

class RobustMatcher {

private:

// pointer to the feature point detector object

cv::Ptr<cv::FeatureDetector> detector;

// pointer to the feature descriptor extractor object

cv::Ptr<cv::DescriptorExtractor> extractor;

float ratio; // max ratio between 1st and 2nd NN

bool refineF; // if true will refine the F matrix

double distance; // min distance to epipolar

double confidence; // confidence level (probability)

public:

RobustMatcher() : ratio(0.65f), refineF(true),

confidence(0.99), distance(3.0) {

// SURF is the default feature

detector= new cv::SurfFeatureDetector();

extractor= new cv::SurfDescriptorExtractor();

}

Note how we used the generic *cv::FeatureDetector* and *cv::DescriptorExtractor* interfaces so that a user can provide any specific implementation. The *SURF* features and descriptors are used here by default, but others can be specified using the appropriate setter methods:

// Set the feature detector

void setFeatureDetector(

cv::Ptr<cv::FeatureDetector>& detect) {

detector= detect;

}

// Set the descriptor extractor

void setDescriptorExtractor(

cv::Ptr<cv::DescriptorExtractor>& desc) {

extractor= desc;

}

The main method is our *match* method that returns matches, detected keypoints, and the estimated fundamental matrix. The method proceeds in five distinct steps (explicitly identified in the comments of the following code) that we will now explore:

// Match feature points using symmetry test and RANSAC

// returns fundemental matrix

cv::Mat match(cv::Mat& image1,

cv::Mat& image2, // input images

// output matches and keypoints

std::vector<cv::DMatch>& matches,

std::vector<cv::KeyPoint>& keypoints1,

std::vector<cv::KeyPoint>& keypoints2) {

// 1a. Detection of the SURF features

detector->detect(image1,keypoints1);

detector->detect(image2,keypoints2);

// 1b. Extraction of the SURF descriptors

cv::Mat descriptors1, descriptors2;

extractor->compute(image1,keypoints1,descriptors1);

extractor->compute(image2,keypoints2,descriptors2);

// 2. Match the two image descriptors

// Construction of the matcher

cv::BruteForceMatcher<cv::L2<float>> matcher;

// from image 1 to image 2

// based on k nearest neighbours (with k=2)

std::vector<std::vector<cv::DMatch>> matches1;

matcher.knnMatch(descriptors1,descriptors2,

matches1, // vector of matches (up to 2 per entry)

2); // return 2 nearest neighbours

// from image 2 to image 1

// based on k nearest neighbours (with k=2)

std::vector<std::vector<cv::DMatch>> matches2;

matcher.knnMatch(descriptors2,descriptors1,

matches2, // vector of matches (up to 2 per entry)

2); // return 2 nearest neighbours

// 3. Remove matches for which NN ratio is

// > than threshold

// clean image 1 -> image 2 matches

int removed= ratioTest(matches1);

// clean image 2 -> image 1 matches

removed= ratioTest(matches2);

// 4. Remove non-symmetrical matches

std::vector<cv::DMatch> symMatches;

symmetryTest(matches1,matches2,symMatches);

// 5. Validate matches using RANSAC

cv::Mat fundemental= ransacTest(symMatches,

keypoints1, keypoints2, matches);

// return the found fundemental matrix

return fundemental;

}

The first step is simply detecting the feature point and computing their descriptors. Next, we proceed to feature matching using the *cv::BruteForceMatcher* class. However, we find the two best matching points for each feature (and not only the best one as we did in the previous recipe). This is accomplished by the *cv::BruteForceMatcher::knnMatch* method (with k=2). Moreover, we perform this matching in two directions, that is, for each point in the first image we find the two best matches in the second image, and then we do the same thing for the feature points of the second image, finding their two best matches in the first image.

Therefore, for each feature point, we have two candidate matches in the other view. These are the two best ones based on the distance between their descriptors. If this measured distance is very low for the best match, and much larger for the second best match, we can safely accept the first match as a good one since it is unambiguously the best choice. Reciprocally, if the two best matches are relatively close in distance, then there exists a possibility that we make an error if we select one or the other. In this case, we should reject both matches. Here, we perform this test in step 3 by verifying that the ratio of the distance of the best match over the distance of the second best match is not greater than a given threshold:

// Clear matches for which NN ratio is > than threshold

// return the number of removed points

// (corresponding entries being cleared,

// i.e. size will be 0)

int ratioTest(std::vector<std::vector<cv::DMatch>>

&matches) {

int removed=0;

// for all matches

for (std::vector<std::vector<cv::DMatch>>::iterator

matchIterator= matches.begin();

matchIterator!= matches.end(); ++matchIterator) {

// if 2 NN has been identified

if (matchIterator->size() > 1) {

// check distance ratio

if ((*matchIterator)[0].distance/

(*matchIterator)[1].distance > ratio) {

matchIterator->clear(); // remove match

removed++;

}

} else { // does not have 2 neighbours

matchIterator->clear(); // remove match

removed++;

}

}

return removed;

}

A large number of ambiguous matches will be eliminated by this procedure as it can be seen from the following example. Here, with a low *SURF* threshold (*=10*), we initially detected 1,600 feature points (black circles) out of which only 55 survive the ratio test (white circles):

(Move the mouse over the image to enlarge it.)

The white lines linking the matched points show that even if we have a large number of good matches, a significant number of false matches have survived. Therefore, a second test will be performed in order to filter our more false matches. Note that the ratio test is also applied to the second match set.

Read more about this book |

*(For more resources related to the article, see here.)*

We now have two relatively good match sets, one from the first image to second image and the other one from second image to the first one. From these sets, we will now extract the matches that are in agreement with both sets. This is the **symmetrical matching scheme** imposing that, for a match pair to be accepted, both points must be the best matching feature of the other:

// Insert symmetrical matches in symMatches vector

void symmetryTest(

const std::vector<std::vector<cv::DMatch>>& matches1,

const std::vector<std::vector<cv::DMatch>>& matches2,

std::vector<cv::DMatch>& symMatches) {

// for all matches image 1 -> image 2

for (std::vector<std::vector<cv::DMatch>>::

const_iterator matchIterator1= matches1.begin();

matchIterator1!= matches1.end(); ++matchIterator1) {

// ignore deleted matches

if (matchIterator1->size() < 2)

continue;

// for all matches image 2 -> image 1

for (std::vector<std::vector<cv::DMatch>>::

const_iterator matchIterator2= matches2.begin();

matchIterator2!= matches2.end();

++matchIterator2) {

// ignore deleted matches

if (matchIterator2->size() < 2)

continue;

// Match symmetry test

if ((*matchIterator1)[0].queryIdx ==

(*matchIterator2)[0].trainIdx &&

(*matchIterator2)[0].queryIdx ==

(*matchIterator1)[0].trainIdx) {

// add symmetrical match

symMatches.push_back(

cv::DMatch((*matchIterator1)[0].queryIdx,

(*matchIterator1)[0].trainIdx,

(*matchIterator1)[0].distance));

break; // next match in image 1 -> image 2

}

}

}

}

In our test pair, 31 matches survived this symmetry test. The last test now consists of an additional filtering test that will this time use the fundamental matrix in order to reject matches that do not obey the epipolar constraint. This test is based on the *RANSAC* method that can compute the fundamental matrix even when outliers are still present in the match set (this method will be explained in the following section):

// Identify good matches using RANSAC

// Return fundemental matrix

cv::Mat ransacTest(

const std::vector<cv::DMatch>& matches,

const std::vector<cv::KeyPoint>& keypoints1,

const std::vector<cv::KeyPoint>& keypoints2,

std::vector<cv::DMatch>& outMatches) {

// Convert keypoints into Point2f

std::vector<cv::Point2f> points1, points2;

for (std::vector<cv::DMatch>::

const_iterator it= matches.begin();

it!= matches.end(); ++it) {

// Get the position of left keypoints

float x= keypoints1[it->queryIdx].pt.x;

float y= keypoints1[it->queryIdx].pt.y;

points1.push_back(cv::Point2f(x,y));

// Get the position of right keypoints

x= keypoints2[it->trainIdx].pt.x;

y= keypoints2[it->trainIdx].pt.y;

points2.push_back(cv::Point2f(x,y));

}

// Compute F matrix using RANSAC

std::vector<uchar> inliers(points1.size(),0);

cv::Mat fundemental= cv::findFundamentalMat(

cv::Mat(points1),cv::Mat(points2), // matching points

inliers, // match status (inlier or outlier)

CV_FM_RANSAC, // RANSAC method

distance, // distance to epipolar line

confidence); // confidence probability

// extract the surviving (inliers) matches

std::vector<uchar>::const_iterator

itIn= inliers.begin();

std::vector<cv::DMatch>::const_iterator

itM= matches.begin();

// for all matches

for ( ;itIn!= inliers.end(); ++itIn, ++itM) {

if (*itIn) { // it is a valid match

outMatches.push_back(*itM);

}

}

if (refineF) {

// The F matrix will be recomputed with

// all accepted matches

// Convert keypoints into Point2f

// for final F computation

points1.clear();

points2.clear();

for (std::vector<cv::DMatch>::

const_iterator it= outMatches.begin();

it!= outMatches.end(); ++it) {

// Get the position of left keypoints

float x= keypoints1[it->queryIdx].pt.x;

float y= keypoints1[it->queryIdx].pt.y;

points1.push_back(cv::Point2f(x,y));

// Get the position of right keypoints

x= keypoints2[it->trainIdx].pt.x;

y= keypoints2[it->trainIdx].pt.y;

points2.push_back(cv::Point2f(x,y));

}

// Compute 8-point F from all accepted matches

fundemental= cv::findFundamentalMat(

cv::Mat(points1),cv::Mat(points2), // matches

CV_FM_8POINT); // 8-point method

}

return fundemental;

}

This code is a bit long because the keypoints need to be converted into *cv::Point2f* before the F matrix computation.

The complete matching process using our *RobustMatcher* class is initiated by the following calls:

// Prepare the matcher

RobustMatcher rmatcher;

rmatcher.setConfidenceLevel(0.98);

rmatcher.setMinDistanceToEpipolar(1.0);

rmatcher.setRatio(0.65f);

cv::Ptr<cv::FeatureDetector> pfd=

new cv::SurfFeatureDetector(10);

rmatcher.setFeatureDetector(pfd);

// Match the two images

std::vector<cv::DMatch> matches;

std::vector<cv::KeyPoint> keypoints1, keypoints2;

cv::Mat fundemental= rmatcher.match(image1,image2,

matches, keypoints1, keypoints2);

It results in 23 matches that are shown in the following screenshot with their corresponding epipolar lines:

## How it works...

In the preceding recipe, we learned that it is possible to estimate the fundamental matrix associated with an image pair from a number of feature point matches. Obviously, to be exact, this match set must be made of only good matches. However, in a real context, it is not possible to guarantee that a match set obtained by comparing the descriptors of detected feature points will be perfectly exact. This is why a fundamental matrix estimation method based on the **RANSAC** (**RANdom SAmpling Consensus**) strategy has been introduced.

The RANSAC algorithm aims at estimating a given mathematical entity from a data set that may contain a number of outliers. The idea is to randomly select some data points from the set and perform the estimation only with these. The number of selected points should be the minimum number of points required to estimate the mathematical entity. In the case of the fundamental matrix, eight matched pairs is this minimum number (in fact, it could be seven matches, but the 8-point linear algorithm is faster to compute). Once the fundamental matrix is estimated from these random 8 matches, all of the other matches in the match set are tested against the epipolar constraint that derives from this matrix. All of the matches that fulfill this constraint (that is, matches for which the corresponding feature is at a short distance from its epipolar line) are identified. These matches form the support set of the computed fundamental matrix.

The central idea behind the RANSAC algorithm is that the larger the support set is, the higher the probability that the computed matrix is the right one. Obviously, if one (or more) of the randomly selected matches is a wrong match, then the computed fundamental matrix will also be wrong, and its support set is expected to be small. This process is repeated a number of times, and at the end, the matrix with the largest support will be retained as the most probable one.

Therefore, our objective is to pick eight random matches several times so that eventually we select eight good ones which should give us a large support set. Depending on the number of wrong matches in the entire data set, the probability of selecting a set of eight correct matches will differ. We however know that the more selections we make, the higher the confidence will be that we have, among those selections, at least one good match set. More precisely, if we assume that the match set is made of n% inliers (good matches), then the probability that we select eight good matches is 8n. Consequently, the probability that a selection contains at least one wrong match is (1-n8). If we make k selections, the probability of having one random set containing only good matches is 1-(1-8n)k. This is the confidence probability c, and we want this probability to be as high as possible since we need at least one good set of matches in order to obtain the correct fundamental matrix. Therefore, when running the RANSAC algorithm, one needs to determine the number of selection k that needs to be made in order to obtain a given confidence level.

When using the cv::findFundamentalMat function with Ransacs, two extra parameters are provided. The first one is the confidence level that determines the number of iterations to be made. The second one is the maximum distance to the epipolar line for a point to be considered as an inlier. All matched pairs in which a point is at a distance from its epipolar line larger than the one specified will be reported as an outlier. Therefore, the function also returns a *std::vector* of *char* value indicating that the corresponding match has been identified as an outlier (0) or as an inlier (1).

The more good matches you have in your initial match set, the higher the probability that RANSAC will give you the correct fundamental matrix. This is why we applied several filters to the match set before calling the *cv::findFundamentalMat* function. Obviously, you can decide to skip one or the other of the steps that were proposed in this recipe. It is just a question of balancing the computational complexity, the final number of matches, and the required level of confidence that the obtained match set will contain only exact matches.

# Computing a homography between two images

The second recipe of this article showed you how to compute the fundamental matrix of an image pair from a set of matches. Another mathematical entity exists that can be computed from match pairs: a homography. Like the fundamental matrix, the homography is a *3x3* matrix with special properties and, as we will see in this recipe, it applies to two-view images in specific situations.

## Getting ready

Let's consider again the projective relation between a 3D point and its image on a camera that we introduced in the first recipe of this article. Basically, we learned that this relation is expressed by a *3x4* matrix. Now, if we consider the special case where two views of a scene are separated by a pure rotation, then it can be observed that the fourth column of the extrinsic matrix will be made of all 0s (that is, translation is null). As a result, the projective relation in this special case becomes a *3x3* matrix. This matrix is called a **homography** and it implies that, under special circumstances (here, a pure rotation), the image of a point in one view is related to the image of the same point in another by a linear relation:

In homogeneous coordinates, this relation holds up to a scale factor represented here by the scalar value s. Once this matrix is estimated, all points in one view can be transferred to the second view using this relation. Note that as a side effect of the homography relation for pure rotation, the fundamental matrix becomes undefined in this case.

## How to do it...

Suppose we have two images separated by a pure rotation. These two images can be matched using our *RobustMatcher* class, except that we skip the RANSAC validation step (identified as step 5 in our *match* method) since this one involves fundamental matrix estimation. Instead, we will apply a RANSAC step which will involve the estimation of a homography based on a match set (that obviously contains a good number of outliers). This is done by using the *cv::findHomography* function that is very similar to the *cv::findFundementalMat* function:

// Find the homography between image 1 and image 2

std::vector<uchar> inliers(points1.size(),0);

cv::Mat homography= cv::findHomography(

cv::Mat(points1), // corresponding

cv::Mat(points2), // points

inliers, // outputted inliers matches

CV_RANSAC, // RANSAC method

1.); // max distance to reprojection point

Recall that a homography will only exist if the two images are separated by a pure rotation, which is the case of the following two images:

The resulting inliers that comply with the found homography have been drawn on those images by the following loop:

// Draw the inlier points

std::vector<cv::Point2f>::const_iterator itPts=

points1.begin();

std::vector<uchar>::const_iterator itIn= inliers.begin();

while (itPts!=points1.end()) {

// draw a circle at each inlier location

if (*itIn)

cv::circle(image1,*itPts,3,

cv::Scalar(255,255,255),2);

++itPts;

++itIn;

}

itPts= points2.begin();

itIn= inliers.begin();

while (itPts!=points2.end()) {

// draw a circle at each inlier location

if (*itIn)

cv::circle(image2,*itPts,3,

cv::Scalar(255,255,255),2);

++itPts;

++itIn;

}

As explained in the preceding section, once the homography is computed, you can transfer image points from one image to the other. In fact, you can do this for all pixels of an image and the result will be to transform this image to the other view. There is an OpenCV function that does exactly this:

// Warp image 1 to image 2

cv::Mat result;

cv::warpPerspective(image1, // input image

result, // output image

homography, // homography

cv::Size(2*image1.cols,

image1.rows)); // size of output image

Once this new image is obtained, it can be appended to the other image in order to expand the view (since the two images are now from the same point of view):

// Copy image 1 on the first half of full image

cv::Mat half(result,cv::Rect(0,0,image2.cols,image2.rows));

image2.copyTo(half); // copy image2 to image1 roi

The result is the following image:

(Move the mouse over the image to enlarge it.)

## How it works...

When two views are related by a homography, it becomes possible to determine where a given scene point on one image is found on the other image. This property becomes particularly interesting for points that fall outside the image boundaries. Indeed, since the second view shows a portion of the scene that is not visible in the first image, you can use the homography in order to expand the image by reading in the other image the color value of additional pixels. That is how we were able to create a new image that is an expansion of our second image in which extra columns were added to the right.

The homography computed by *cv::findHomography* is the one that maps points in the first image to points in the second image. What we need in order to transfer the points of image 1 to image 2 is in fact the inverse homography. This is exactly what the function *cv::warpPerspective* is doing by default, that is, it uses the inverse of the homography provided as input to get the color value of each point of the output image. When an output pixel is transferred to a point outside the input image, a black value (0) is simply assigned to this pixel. Note that an optional flag *cv::WARP_INVERSE_MAP* can be specified as an optional fifth argument in *cv::warpPerspective* if one wants to use the direct homography instead of the inverted one during the pixel transfer process.

## There's more...

A homography also exists between two views of a plane. This can be demonstrated by looking again at the camera projection equation as we did for the pure rotation case. When a plane is observed, we can, without loss of generality, set the reference frame of the plane such that all of its points have a Z coordinate which equals to 0. This will also cancel one of the columns of the *3x4* projection matrix resulting in a *3x3* matrix: a homography. This means that if, for example, you have several pictures from different points of view of the flat façade of a building, you can compute the homography between these images and build a large mosaic of the façade by wrapping the images and assembling them together as we did in this recipe.

A minimum of four matched points between two views are required to compute a homography. The function *cv::getPerspectiveTransform* allows such a transformation from four corresponding points to be computed.

# Summary

This article analyzed the different relations involved in image formation. It also explored the projective relations that exist between two images of a same scene.

**Further resources related to this subject:**

- Animating in Panda3D [article]
- Photo Manipulation with GIMP 2.6 [article]
- Cocos2d for iPhone: Surfing Through Scenes [article]
- How to Create an OpenSceneGraph Application [article]
- OpenCV: Image Processing using Morphological Filters [article]