OpenCV: Estimating Projective Relations in Images

If you have at least some basic knowledge of C++, this book will teach you how to write computer vision applications for the modern world. The recipe-based approach comes with explanations and screenshots for easy learning.


OpenCV 2 Computer Vision Application Programming Cookbook

OpenCV 2 Computer Vision Application Programming Cookbook

Over 50 recipes to master this library of programming functions for real-time computer vision

        Read more about this book      

(For more resources related to the article, see here.)


Images are generally produced using a digital camera that captures a scene by projecting light onto an image sensor going through its lens. The fact that an image is formed through the projection of a 3D scene onto a 2D plane imposes the existence of important relations between a scene and its image, and between different images of the same scene. Projective geometry is the tool that is used to describe and characterize, in mathematical terms, the process of image formation. In this article, you will learn some of the fundamental projective relations that exist in multi-view imagery and how these can be used in computer vision programming. But before we start the recipes, let's explore the basic concepts related to scene projection and image formation.

Image formation

Fundamentally, the process used to produce images has not changed since the beginning of photography. The light coming from an observed scene is captured by a camera through a frontal aperture and the captured light rays hit an image plane (or image sensor) located on the back of the camera. Additionally, a lens is used to concentrate the rays coming from the different scene elements. This process is illustrated by the following figure:

OpenCV: Estimating Projective Relations in Images

Here, do is the distance from the lens to the observed object, di is the distance from the lens to the image plane, and f is the focal length of the lens. These quantities are related by the so-called thin lens equation:

OpenCV: Estimating Projective Relations in Images

In computer vision, this camera model can be simplified in a number of ways. First, we can neglect the effect of the lens by considering a camera with an infinitesimal aperture since, in theory, this does not change the image. Only the central ray is therefore considered. Second, since most of the time we have do>>di, we can assume that the image plane is located at the focal distance. Finally, we can notice from the geometry of the system, that the image on the plane is inverted. We can obtain an identical but upright image by simply positioning the image plane in front of the lens. Obviously, this is not physically feasible, but from a mathematical point of view, this is completely equivalent. This simplified model is often referred to as the pin-hole camera model and it is represented as follows:

OpenCV: Estimating Projective Relations in Images

From this model, and using the law of similar triangles, we can easily derive the basic projective equation:

OpenCV: Estimating Projective Relations in Images

The size (hi) of the image of an object (of height ho) is therefore inversely proportional to its distance (do) from the camera which is naturally true. This relation allows the position of the image of a 3D scene point to be predicted onto the image plane of a camera.

Calibrating a camera

From the introduction of this article, we learned that the essential parameters of a camera under the pin-hole model are its focal length and the size of the image plane (which defines the field of view of the camera). Also, since we are dealing with digital images, the number of pixels on the image plane is another important characteristic of a camera. Finally, in order to be able to compute the position of an image's scene point in pixel coordinates, we need one additional piece of information. Considering the line coming from the focal point that is orthogonal to the image plane, we need to know at which pixel position this line pierces the image plane. This point is called the principal point. It could be logical to assume that this principal point is at the center of the image plane, but in practice, this one might be off by few pixels depending at which precision the camera has been manufactured.

Camera calibration is the process by which the different camera parameters are obtained. One can obviously use the specifications provided by the camera manufacturer, but for some tasks, such as 3D reconstruction, these specifications are not accurate enough. Camera calibration will proceed by showing known patterns to the camera and analyzing the obtained images. An optimization process will then determine the optimal parameter values that explain the observations. This is a complex process but made easy by the availability of OpenCV calibration functions.

How to do it...

To calibrate a camera, the idea is show to this camera a set of scene points for which their 3D position is known. You must then determine where on the image these points project. Obviously, for accurate results, we need to observe several of these points. One way to achieve this would be to take one picture of a scene with many known 3D points. A more convenient way would be to take several images from different viewpoints of a set of some 3D points. This approach is simpler but requires computing the position of each camera view, in addition to the computation of the internal camera parameters which fortunately is feasible.

OpenCV proposes to use a chessboard pattern to generate the set of 3D scene points required for calibration. This pattern creates points at the corners of each square, and since this pattern is flat, we can freely assume that the board is located at Z=0 with the X and Y axes well aligned with the grid. In this case, the calibration process simply consists of showing the chessboard pattern to the camera from different viewpoints. Here is one example of a calibration pattern image:

OpenCV: Estimating Projective Relations in Images

The nice thing is that OpenCV has a function that automatically detects the corners of this chessboard pattern. You simply provide an image and the size of the chessboard used (number of vertical and horizontal inner corner points). The function will return the position of these chessboard corners on the image. If the function fails to find the pattern, then it simply returns false:

// output vectors of image points
std::vector<cv::Point2f> imageCorners;
// number of corners on the chessboard
cv::Size boardSize(6,4);
// Get the chessboard corners
bool found = cv::findChessboardCorners(image,
boardSize, imageCorners);

Note that this function accepts additional parameters if one needs to tune the algorithm, which are not discussed here. There is also a function that draws the detected corners on the chessboard image with lines connecting them in sequence:

//Draw the corners
boardSize, imageCorners,
found); // corners have been found

The image obtained is seen here:

OpenCV: Estimating Projective Relations in Images

The lines connecting the points shows the order in which the points are listed in the vector of detected points. Now to calibrate the camera, we need to input a set of such image points together with the coordinate of the corresponding 3D points. Let's encapsulate the calibration process in a CameraCalibrator class:

class CameraCalibrator {
// input points:
// the points in world coordinates
std::vector<std::vector<cv::Point3f>> objectPoints;
// the point positions in pixels
std::vector<std::vector<cv::Point2f>> imagePoints;
// output Matrices
cv::Mat cameraMatrix;
cv::Mat distCoeffs;
// flag to specify how calibration is done
int flag;
// used in image undistortion
cv::Mat map1,map2;
bool mustInitUndistort;
CameraCalibrator() : flag(0), mustInitUndistort(true) {};

As mentioned previously, the 3D coordinates of the points on the chessboard pattern can be easily determined if we conveniently place the reference frame on the board. The method that accomplishes this takes a vector of the chessboard image filename as input:

// Open chessboard images and extract corner points
int CameraCalibrator::addChessboardPoints(
const std::vector<std::string>& filelist,
cv::Size & boardSize) {
// the points on the chessboard
std::vector<cv::Point2f> imageCorners;
std::vector<cv::Point3f> objectCorners;
// 3D Scene Points:
// Initialize the chessboard corners
// in the chessboard reference frame
// The corners are at 3D location (X,Y,Z)= (i,j,0)
for (int i=0; i<boardSize.height; i++) {
for (int j=0; j<boardSize.width; j++) {
objectCorners.push_back(cv::Point3f(i, j, 0.0f));
// 2D Image points:
cv::Mat image; // to contain chessboard image
int successes = 0;
// for all viewpoints
for (int i=0; i<filelist.size(); i++) {
// Open the image
image = cv::imread(filelist[i],0);
// Get the chessboard corners
bool found = cv::findChessboardCorners(
image, boardSize, imageCorners);
// Get subpixel accuracy on the corners
cv::cornerSubPix(image, imageCorners,
cv::TermCriteria(cv::TermCriteria::MAX_ITER +
30, // max number of iterations
0.1)); // min accuracy
//If we have a good board, add it to our data
if (imageCorners.size() == boardSize.area()) {
// Add image and scene points from one view
addPoints(imageCorners, objectCorners);
return successes;

The first loop inputs the 3D coordinates of the chessboard, which are specified in an arbitrary square size unit here. The corresponding image points are the ones provided by the cv::findChessboardCorners function. This is done for all available viewpoints. Moreover, in order to obtain a more accurate image point location, the function cv::cornerSubPix can be used and as the name suggests, the image points will then be localized at sub-pixel accuracy. The termination criterion that is specified by the cv::TermCriteria object defines a maximum number of iterations and a minimum accuracy in sub-pixel coordinates. The first of these two conditions that is reached will stop the corner refinement process.

When a set of chessboard corners has been successfully detected, these points are added to our vector of image and scene points:

// Add scene points and corresponding image points
void CameraCalibrator::addPoints(const std::vector<cv::Point2f>&
imageCorners, const std::vector<cv::Point3f>& objectCorners) {
// 2D image points from one view
// corresponding 3D scene points

The vectors contains std::vector instances. Indeed, each vector element being a vector of points from one view.

Once a sufficient number of chessboard images have been processed (and consequently a large number of 3D scene point/2D image point correspondences are available), we can initiate the computation of the calibration parameters:

// Calibrate the camera
// returns the re-projection error
double CameraCalibrator::calibrate(cv::Size &imageSize)
// undistorter must be reinitialized
mustInitUndistort= true;
//Output rotations and translations
std::vector<cv::Mat> rvecs, tvecs;
// start calibration
calibrateCamera(objectPoints, // the 3D points
imagePoints, // the image points
imageSize, // image size
cameraMatrix,// output camera matrix
distCoeffs, // output distortion matrix
rvecs, tvecs,// Rs, Ts
flag); // set options

In practice, 10 to 20 chessboard images are sufficient, but these must be taken from different viewpoints at different depths. The two important outputs of this function are the camera matrix and the distortion parameters. The camera matrix will be described in the next section. For now, let's consider the distortion parameters. So far, we have mentioned that with the pin-hole camera model, we can neglect the effect of the lens. But this is only possible if the lens used to capture an image does not introduce too important optical distortions. Unfortunately, this is often the case with lenses of lower quality or with lenses having a very short focal length. You may have already noticed that in the image we used for our example, the chessboard pattern shown is clearly distorted. The edges of the rectangular board being curved in the image. It can also be noticed that this distortion becomes more important as we move far from the center of the image. This is a typical distortion observed with fish-eye lens and it is called radial distortion. The lenses that are used in common digital cameras do not exhibit such a high degree of distortion, but in the case of the lens used here, these distortions cannot certainly be ignored.

It is possible to compensate for these deformations by introducing an appropriate model. The idea is to represent the distortions induced by a lens by a set of mathematical equations. Once established, these equations can then be reverted in order to undo the distortions visible on the image. Fortunately, the exact parameters of the transformation that will correct the distortions can be obtained together with the other camera parameter during the calibration phase. Once this is done, any image from the newly calibrated camera can be undistorted:

// remove distortion in an image (after calibration)
cv::Mat CameraCalibrator::remap(const cv::Mat &image) {
cv::Mat undistorted;
if (mustInitUndistort) { // called once per calibration
cameraMatrix, // computed camera matrix
distCoeffs, // computed distortion matrix
cv::Mat(), // optional rectification (none)
cv::Mat(), // camera matrix to generate undistorted
image.size(), // size of undistorted
CV_32FC1, // type of output map
map1, map2); // the x and y mapping functions
mustInitUndistort= false;
// Apply mapping functions
cv::remap(image, undistorted, map1, map2,
cv::INTER_LINEAR); // interpolation type
return undistorted;

Which results in the following image:

OpenCV: Estimating Projective Relations in Images

As you can see, once the image is undistorted, we obtain a regular perspective image.

How it works...

In order to explain the result of the calibration, we need to go back to the figure in the introduction which describes the pin-hole camera model. More specifically, we want to demonstrate the relation between a point in 3D at position (X,Y,Z) and its image (x,y) on a camera specified in pixel coordinates. Let's redraw this figure by adding a reference frame that we position at the center of the projection as seen here:

OpenCV: Estimating Projective Relations in Images

Note that the Y-axis is pointing downward to get a coordinate system compatible with the usual convention that places the image origin at the upper-left corner. We learned previously that the point (X,Y,Z) will be projected onto the image plane at (fX/Z,fY/Z). Now, if we want to translate this coordinate into pixels, we need to divide the 2D image position by, respectively, the pixel width (px) and height (py). We notice that by dividing the focal length f given in world units (most often meters or millimeters) by px, then we obtain the focal length expressed in (horizontal) pixels. Let's then define this term as fx. Similarly, fy =f/py is defined as the focal length expressed in vertical pixel unit. The complete projective equation is therefore:

OpenCV: Estimating Projective Relations in Images

Recall that (u0,v0) is the principal point that is added to the result in order to move the origin to the upper-left corner of the image. These equations can be rewritten in matrix form through the introduction of homogeneous coordinates in which 2D points are represented by 3-vectors, and 3D points represented by 4-vectors (the extra coordinate is simply an arbitrary scale factor that need to be removed when a 2D coordinate needs to be extracted from a homogeneous 3-vector). Here is the projective equation rewritten:

OpenCV: Estimating Projective Relations in Images

The second matrix is a simple projection matrix. The first matrix includes all of the camera parameters which are called the intrinsic parameters of the camera. This 3x3 matrix is one of the output matrices returned by the cv::calibrateCamera function. There is also a function called cv::calibrationMatrixValues that returns the value of the intrinsic parameters given a calibration matrix.

More generally, when the reference frame is not at the projection center of the camera, we will need to add a rotation (a 3x3 matrix) and a translation vector (3x1 matrix). These two matrices describe the rigid transformation that must be applied to the 3D points in order to bring them back to the camera reference frame. Therefore, we can rewrite the projection equation in its most general form:

OpenCV: Estimating Projective Relations in Images

Remember that in our calibration example, the reference frame was placed on the chessboard. Therefore, there is a rigid transformation (rotation and translation) that must be computed for each view. These are in the output parameter list of the cv::calibrateCamera function. The rotation and translation components are often called the extrinsic parameters of the calibration and they are different for each view. The intrinsic parameters remain constant for a given camera/lens system. The intrinsic parameters of our test camera obtained from a calibration based on 20 chessboard images are fx=167, fy=178, u0=156, v0=119. These results are obtained by cv::calibrateCamera through an optimization process aimed at finding the intrinsic and extrinsic parameters that will minimize the difference between the predicted image point position, as computed from the projection of the 3D scene points, and the actual image point position, as observed on the image. The sum of this difference for all points specified during the calibration is called the re-projection error.

To correct the distortion, OpenCV uses a polynomial function that is applied to the image point in order to move them at their undistorted position. By default, 5 coefficients are used; a model made of 8 coefficients is also available. Once these coefficients are obtained, it is possible to compute 2 mapping functions (one for the x coordinate and one for the y) that will give the new undistorted position of an image point on a distorted image. This is computed by the function cv::initUndistortRectifyMap and the function cv::remap remaps all of the points of an input image to a new image. Note that because of the non-linear transformation, some pixels of the input image now fall outside the boundary of the output image. You can expand the size of the output image to compensate for this loss of pixels, but you will now obtain output pixels that have no values in the input image (they will then be displayed as black pixels).

There's more...

When a good estimate of the camera intrinsic parameters are known, it could be advantageous to input them to the cv::calibrateCamera function. They will then be used as initial values in the optimization process. To do so, you just need to add the flag CV_CALIB_USE_INTRINSIC_GUESS and input these values in the calibration matrix parameter. It is also possible to impose a fixed value for the principal point (CV_CALIB_ FIX_PRINCIPAL_POINT), which can often be assumed to be the central pixel. You can also impose a fixed ratio for the focal lengths fx and fy (CV_CALIB_FIX_RATIO) in which case you assume pixels of square shape.

        Read more about this book      

(For more resources related to the article, see here.)

Computing the fundamental matrix of an image pair

The previous recipe showed you how to recover the projective equation of a single camera. In this recipe, we will explore the projective relation that exists between two images viewing the same scene. These two images could have been obtained by moving a camera at two different locations taking pictures from two viewpoints, or by using two cameras, each of them taking a different picture of the scene. When these two cameras are separated by a rigid baseline, we use the term stereovision.

Getting ready

Let's now consider two cameras observing a given scene point as seen here:

OpenCV: Estimating Projective Relations in Images

We learned that we can find the image x of a 3D point X by tracing a line joining this 3D point with the camera's center. Conversely, the image point that we observe at position x can be located anywhere on this line in 3D space. This implies that if we want to find the corresponding point of a given image point in another image, we need to search along the projection of this line onto the second image plane. This imaginary line is called the epipolar line of point x. It defines a fundamental constraint that must satisfy two corresponding points, that is, the match of a given point must lie on the epipolar line of this point in the other view. The exact orientation of this epipolar line depends on the respective position of the two cameras. In fact, the position of the epipolar lines characterizes the geometry of a two-view system.

Another observation that can be made from the geometry of this two-view system is that all of the epipolar lines pass through the same point. This point corresponds to the projection of one camera center onto the other camera. This special point is called an epipole.

Mathematically, it can be shown that the relation between an image point and its corresponding epipolar line can be expressed using a 3x3 matrix such as the following:

OpenCV: Estimating Projective Relations in Images

In projective geometry, a 2D line is also represented by a 3-vector. It corresponds to the set of 2D points (x',y') satisfying the equation l1'x'+ l2'y'+ l3'=0 (the prime superscript denotes that this line belongs to the second image). Consequently, the matrix F, called the fundamental matrix, maps a 2D image point in one view to an epipolar line in the other view.

How to do it...

Estimating the fundamental matrix of an image pair can be done by solving a set of equations which involve a certain number of known matched points between the two images. The minimum number of such matches is seven. Using an image pair, we can manually select seven good matches (seen in the following screenshot). These will be used to compute the fundamental matrix using the cv::findFundementalMat OpenCV function.

(Move the mouse over the image to enlarge it.)

If we have the image points in each image as cv::keypoint instance, they first need to be converted into cv::Point2f in order to be used with cv::findFundementalMat. An OpenCV function can be used to this end:

// Convert keypoints into Point2f
std::vector<cv::Point2f> selPoints1, selPoints2;

The two vectors selPoints1 and selPoints2 contain the corresponding points in the two images. keypoints1 and keypoints2 are the selected Keypoint instances. The call to the cv::findFundementalMat function is then as follows:

// Compute F matrix from 7 matches
cv::Mat fundemental= cv::findFundamentalMat(
cv::Mat(selPoints1), // points in first image
cv::Mat(selPoints2), // points in second image
CV_FM_7POINT); // 7-point method

One way to visually verify the validity of the fundamental matrix is to draw the epipolar lines of some selected points. Another OpenCV function allows the epipolar lines of a given set of points to be computed. Once these are computed, they can be drawn using the cv::line function. The following lines of code accomplish these two steps (that is, computing and drawing epipolar lines in the right image from the points in the left):

// draw the left points corresponding epipolar
// lines in right image
std::vector<cv::Vec3f> lines1;
cv::Mat(selPoints1), // image points
1, // in image 1 (can also be 2)
fundemental, // F matrix
lines1); // vector of epipolar lines
// for all epipolar lines
for (vector<cv::Vec3f>::const_iterator it= lines1.begin();
it!=lines1.end(); ++it) {
// draw the line between first and last column

The result is then seen in the following screenshot:

OpenCV: Estimating Projective Relations in Images

Remember that the epipole is at the intersection point of all epipolar lines, and it is the projection of the other camera center. This epipole is visible on the preceding image. Often, the epipolar lines intersect outside the image boundaries. It is at the location where the first camera would be visible if the two images were taken at the same instant. Observe the image pair and take the time to convince yourself that this indeed makes sense.

How it works...

We explained previously that the fundamental matrix gives, for a point in one image, the equation of the line on which its corresponding point in the other view should be found. If the corresponding point of a point p (expressed in homogenous coordinates) is p', and if F is the fundamental matrix between the two views, then since p' lies on the epipolar line Fp, we have:

OpenCV: Estimating Projective Relations in Images

This equation expresses the relation between two corresponding points and is known as the epipolar constraint. Using this equation it becomes possible to estimate the entries of the matrix using known matches. Since the entries of the F matrix are given up to a scale factor, there are only eight entries to estimate (the ninth can be arbitrarily set to 1). Each match contributes one equation. Therefore, with eight known matches, the matrix can be fully estimated by solving the resulting set of linear equations. This is what is done when you use the CV_FM_8POINT flag with the cv::findFundamentalMat function. Note that, in this case, it is possible (and preferable) to input more than eight matches. The obtained over-determined system of linear equations can then be solved in a mean-square sense.

To estimate the fundamental matrix, an additional constraint can also be exploited. Mathematically, the F matrix maps a 2D point into a 1D pencil of lines (that is, lines intersecting at a common point). The fact that all of these epipolar lines pass by this unique point (the epipole) imposes a constraint on the matrix. This constraint reduces, the number of matches required to estimate the fundamental matrix to seven. Unfortunately, in this case, the set of equations becomes non-linear with up to three possible solutions. The seven-match solution of the F matrix estimation can be invoked in OpenCV by using the CV_FM_7POINT flag. This is what we did in the example of the preceding section.

Lastly, we should mention that the choice of an appropriate set of matches in the image is important to obtain an accurate estimation of the fundamental matrix. In general, the matches should be well distributed across the image and include points at different depth in the scene. Otherwise, the solution will become unstable or degenerate configurations can result.

Matching images using random sample consensus

When two cameras observe the same scene, they see the same elements but under different viewpoints. In this recipe, we take a look at the of image matching problem and we will learn how to exploit the epipolar constraint between two views to match image features more reliably.

The principle we will follow is simple: when we match feature points between two images, we only accept those matches that fall onto the corresponding epipolar lines. However, to be able to check this condition, the fundamental matrix must be known, and we need good matches to estimate this matrix. This seems to be a chicken-and-egg problem. We propose in this recipe a solution in which the fundamental matrix and a set of good matches will be jointly computed.

How to do it...

The objective is to be able to obtain a set of good matches between two views. Therefore, all found feature point correspondences will be validated using the epipolar constraint introduced in the previous recipe. We first define a class which will encapsulate the different elements of the solution that will be proposed:

class RobustMatcher {
// pointer to the feature point detector object
cv::Ptr<cv::FeatureDetector> detector;
// pointer to the feature descriptor extractor object
cv::Ptr<cv::DescriptorExtractor> extractor;
float ratio; // max ratio between 1st and 2nd NN
bool refineF; // if true will refine the F matrix
double distance; // min distance to epipolar
double confidence; // confidence level (probability)
RobustMatcher() : ratio(0.65f), refineF(true),
confidence(0.99), distance(3.0) {
// SURF is the default feature
detector= new cv::SurfFeatureDetector();
extractor= new cv::SurfDescriptorExtractor();

Note how we used the generic cv::FeatureDetector and cv::DescriptorExtractor interfaces so that a user can provide any specific implementation. The SURF features and descriptors are used here by default, but others can be specified using the appropriate setter methods:

// Set the feature detector
void setFeatureDetector(
cv::Ptr<cv::FeatureDetector>& detect) {
detector= detect;
// Set the descriptor extractor
void setDescriptorExtractor(
cv::Ptr<cv::DescriptorExtractor>& desc) {
extractor= desc;

The main method is our match method that returns matches, detected keypoints, and the estimated fundamental matrix. The method proceeds in five distinct steps (explicitly identified in the comments of the following code) that we will now explore:

// Match feature points using symmetry test and RANSAC
// returns fundemental matrix
cv::Mat match(cv::Mat& image1,
cv::Mat& image2, // input images
// output matches and keypoints
std::vector<cv::DMatch>& matches,
std::vector<cv::KeyPoint>& keypoints1,
std::vector<cv::KeyPoint>& keypoints2) {
// 1a. Detection of the SURF features
// 1b. Extraction of the SURF descriptors
cv::Mat descriptors1, descriptors2;
// 2. Match the two image descriptors
// Construction of the matcher
cv::BruteForceMatcher<cv::L2<float>> matcher;
// from image 1 to image 2
// based on k nearest neighbours (with k=2)
std::vector<std::vector<cv::DMatch>> matches1;
matches1, // vector of matches (up to 2 per entry)
2); // return 2 nearest neighbours
// from image 2 to image 1
// based on k nearest neighbours (with k=2)
std::vector<std::vector<cv::DMatch>> matches2;
matches2, // vector of matches (up to 2 per entry)
2); // return 2 nearest neighbours
// 3. Remove matches for which NN ratio is
// > than threshold
// clean image 1 -> image 2 matches
int removed= ratioTest(matches1);
// clean image 2 -> image 1 matches
removed= ratioTest(matches2);
// 4. Remove non-symmetrical matches
std::vector<cv::DMatch> symMatches;
// 5. Validate matches using RANSAC
cv::Mat fundemental= ransacTest(symMatches,
keypoints1, keypoints2, matches);
// return the found fundemental matrix
return fundemental;

The first step is simply detecting the feature point and computing their descriptors. Next, we proceed to feature matching using the cv::BruteForceMatcher class. However, we find the two best matching points for each feature (and not only the best one as we did in the previous recipe). This is accomplished by the cv::BruteForceMatcher::knnMatch method (with k=2). Moreover, we perform this matching in two directions, that is, for each point in the first image we find the two best matches in the second image, and then we do the same thing for the feature points of the second image, finding their two best matches in the first image.

Therefore, for each feature point, we have two candidate matches in the other view. These are the two best ones based on the distance between their descriptors. If this measured distance is very low for the best match, and much larger for the second best match, we can safely accept the first match as a good one since it is unambiguously the best choice. Reciprocally, if the two best matches are relatively close in distance, then there exists a possibility that we make an error if we select one or the other. In this case, we should reject both matches. Here, we perform this test in step 3 by verifying that the ratio of the distance of the best match over the distance of the second best match is not greater than a given threshold:

// Clear matches for which NN ratio is > than threshold
// return the number of removed points
// (corresponding entries being cleared,
// i.e. size will be 0)
int ratioTest(std::vector<std::vector<cv::DMatch>>
&matches) {
int removed=0;
// for all matches
for (std::vector<std::vector<cv::DMatch>>::iterator
matchIterator= matches.begin();
matchIterator!= matches.end(); ++matchIterator) {
// if 2 NN has been identified
if (matchIterator->size() > 1) {
// check distance ratio
if ((*matchIterator)[0].distance/
(*matchIterator)[1].distance > ratio) {
matchIterator->clear(); // remove match
} else { // does not have 2 neighbours
matchIterator->clear(); // remove match
return removed;

A large number of ambiguous matches will be eliminated by this procedure as it can be seen from the following example. Here, with a low SURF threshold (=10), we initially detected 1,600 feature points (black circles) out of which only 55 survive the ratio test (white circles):

(Move the mouse over the image to enlarge it.)

The white lines linking the matched points show that even if we have a large number of good matches, a significant number of false matches have survived. Therefore, a second test will be performed in order to filter our more false matches. Note that the ratio test is also applied to the second match set.

        Read more about this book      

(For more resources related to the article, see here.)

We now have two relatively good match sets, one from the first image to second image and the other one from second image to the first one. From these sets, we will now extract the matches that are in agreement with both sets. This is the symmetrical matching scheme imposing that, for a match pair to be accepted, both points must be the best matching feature of the other:

// Insert symmetrical matches in symMatches vector
void symmetryTest(
const std::vector<std::vector<cv::DMatch>>& matches1,
const std::vector<std::vector<cv::DMatch>>& matches2,
std::vector<cv::DMatch>& symMatches) {
// for all matches image 1 -> image 2
for (std::vector<std::vector<cv::DMatch>>::
const_iterator matchIterator1= matches1.begin();
matchIterator1!= matches1.end(); ++matchIterator1) {
// ignore deleted matches
if (matchIterator1->size() < 2)
// for all matches image 2 -> image 1
for (std::vector<std::vector<cv::DMatch>>::
const_iterator matchIterator2= matches2.begin();
matchIterator2!= matches2.end();
++matchIterator2) {
// ignore deleted matches
if (matchIterator2->size() < 2)
// Match symmetry test
if ((*matchIterator1)[0].queryIdx ==
(*matchIterator2)[0].trainIdx &&
(*matchIterator2)[0].queryIdx ==
(*matchIterator1)[0].trainIdx) {
// add symmetrical match
break; // next match in image 1 -> image 2

In our test pair, 31 matches survived this symmetry test. The last test now consists of an additional filtering test that will this time use the fundamental matrix in order to reject matches that do not obey the epipolar constraint. This test is based on the RANSAC method that can compute the fundamental matrix even when outliers are still present in the match set (this method will be explained in the following section):

// Identify good matches using RANSAC
// Return fundemental matrix
cv::Mat ransacTest(
const std::vector<cv::DMatch>& matches,
const std::vector<cv::KeyPoint>& keypoints1,
const std::vector<cv::KeyPoint>& keypoints2,
std::vector<cv::DMatch>& outMatches) {
// Convert keypoints into Point2f
std::vector<cv::Point2f> points1, points2;
for (std::vector<cv::DMatch>::
const_iterator it= matches.begin();
it!= matches.end(); ++it) {
// Get the position of left keypoints
float x= keypoints1[it->queryIdx].pt.x;
float y= keypoints1[it->queryIdx].pt.y;
// Get the position of right keypoints
x= keypoints2[it->trainIdx].pt.x;
y= keypoints2[it->trainIdx].pt.y;
// Compute F matrix using RANSAC
std::vector<uchar> inliers(points1.size(),0);
cv::Mat fundemental= cv::findFundamentalMat(
cv::Mat(points1),cv::Mat(points2), // matching points
inliers, // match status (inlier or outlier)
distance, // distance to epipolar line
confidence); // confidence probability
// extract the surviving (inliers) matches
itIn= inliers.begin();
itM= matches.begin();
// for all matches
for ( ;itIn!= inliers.end(); ++itIn, ++itM) {
if (*itIn) { // it is a valid match
if (refineF) {
// The F matrix will be recomputed with
// all accepted matches
// Convert keypoints into Point2f
// for final F computation
for (std::vector<cv::DMatch>::
const_iterator it= outMatches.begin();
it!= outMatches.end(); ++it) {
// Get the position of left keypoints
float x= keypoints1[it->queryIdx].pt.x;
float y= keypoints1[it->queryIdx].pt.y;
// Get the position of right keypoints
x= keypoints2[it->trainIdx].pt.x;
y= keypoints2[it->trainIdx].pt.y;
// Compute 8-point F from all accepted matches
fundemental= cv::findFundamentalMat(
cv::Mat(points1),cv::Mat(points2), // matches
CV_FM_8POINT); // 8-point method
return fundemental;

This code is a bit long because the keypoints need to be converted into cv::Point2f before the F matrix computation.

The complete matching process using our RobustMatcher class is initiated by the following calls:

// Prepare the matcher
RobustMatcher rmatcher;
cv::Ptr<cv::FeatureDetector> pfd=
new cv::SurfFeatureDetector(10);
// Match the two images
std::vector<cv::DMatch> matches;
std::vector<cv::KeyPoint> keypoints1, keypoints2;
cv::Mat fundemental= rmatcher.match(image1,image2,
matches, keypoints1, keypoints2);

It results in 23 matches that are shown in the following screenshot with their corresponding epipolar lines:

OpenCV: Estimating Projective Relations in Images

OpenCV: Estimating Projective Relations in Images

How it works...

In the preceding recipe, we learned that it is possible to estimate the fundamental matrix associated with an image pair from a number of feature point matches. Obviously, to be exact, this match set must be made of only good matches. However, in a real context, it is not possible to guarantee that a match set obtained by comparing the descriptors of detected feature points will be perfectly exact. This is why a fundamental matrix estimation method based on the RANSAC (RANdom SAmpling Consensus) strategy has been introduced.

The RANSAC algorithm aims at estimating a given mathematical entity from a data set that may contain a number of outliers. The idea is to randomly select some data points from the set and perform the estimation only with these. The number of selected points should be the minimum number of points required to estimate the mathematical entity. In the case of the fundamental matrix, eight matched pairs is this minimum number (in fact, it could be seven matches, but the 8-point linear algorithm is faster to compute). Once the fundamental matrix is estimated from these random 8 matches, all of the other matches in the match set are tested against the epipolar constraint that derives from this matrix. All of the matches that fulfill this constraint (that is, matches for which the corresponding feature is at a short distance from its epipolar line) are identified. These matches form the support set of the computed fundamental matrix.

The central idea behind the RANSAC algorithm is that the larger the support set is, the higher the probability that the computed matrix is the right one. Obviously, if one (or more) of the randomly selected matches is a wrong match, then the computed fundamental matrix will also be wrong, and its support set is expected to be small. This process is repeated a number of times, and at the end, the matrix with the largest support will be retained as the most probable one.

Therefore, our objective is to pick eight random matches several times so that eventually we select eight good ones which should give us a large support set. Depending on the number of wrong matches in the entire data set, the probability of selecting a set of eight correct matches will differ. We however know that the more selections we make, the higher the confidence will be that we have, among those selections, at least one good match set. More precisely, if we assume that the match set is made of n% inliers (good matches), then the probability that we select eight good matches is 8n. Consequently, the probability that a selection contains at least one wrong match is (1-n8). If we make k selections, the probability of having one random set containing only good matches is 1-(1-8n)k. This is the confidence probability c, and we want this probability to be as high as possible since we need at least one good set of matches in order to obtain the correct fundamental matrix. Therefore, when running the RANSAC algorithm, one needs to determine the number of selection k that needs to be made in order to obtain a given confidence level.

When using the cv::findFundamentalMat function with Ransacs, two extra parameters are provided. The first one is the confidence level that determines the number of iterations to be made. The second one is the maximum distance to the epipolar line for a point to be considered as an inlier. All matched pairs in which a point is at a distance from its epipolar line larger than the one specified will be reported as an outlier. Therefore, the function also returns a std::vector of char value indicating that the corresponding match has been identified as an outlier (0) or as an inlier (1).

The more good matches you have in your initial match set, the higher the probability that RANSAC will give you the correct fundamental matrix. This is why we applied several filters to the match set before calling the cv::findFundamentalMat function. Obviously, you can decide to skip one or the other of the steps that were proposed in this recipe. It is just a question of balancing the computational complexity, the final number of matches, and the required level of confidence that the obtained match set will contain only exact matches.

Computing a homography between two images

The second recipe of this article showed you how to compute the fundamental matrix of an image pair from a set of matches. Another mathematical entity exists that can be computed from match pairs: a homography. Like the fundamental matrix, the homography is a 3x3 matrix with special properties and, as we will see in this recipe, it applies to two-view images in specific situations.

Getting ready

Let's consider again the projective relation between a 3D point and its image on a camera that we introduced in the first recipe of this article. Basically, we learned that this relation is expressed by a 3x4 matrix. Now, if we consider the special case where two views of a scene are separated by a pure rotation, then it can be observed that the fourth column of the extrinsic matrix will be made of all 0s (that is, translation is null). As a result, the projective relation in this special case becomes a 3x3 matrix. This matrix is called a homography and it implies that, under special circumstances (here, a pure rotation), the image of a point in one view is related to the image of the same point in another by a linear relation:

OpenCV: Estimating Projective Relations in Images

In homogeneous coordinates, this relation holds up to a scale factor represented here by the scalar value s. Once this matrix is estimated, all points in one view can be transferred to the second view using this relation. Note that as a side effect of the homography relation for pure rotation, the fundamental matrix becomes undefined in this case.

How to do it...

Suppose we have two images separated by a pure rotation. These two images can be matched using our RobustMatcher class, except that we skip the RANSAC validation step (identified as step 5 in our match method) since this one involves fundamental matrix estimation. Instead, we will apply a RANSAC step which will involve the estimation of a homography based on a match set (that obviously contains a good number of outliers). This is done by using the cv::findHomography function that is very similar to the cv::findFundementalMat function:

// Find the homography between image 1 and image 2
std::vector<uchar> inliers(points1.size(),0);
cv::Mat homography= cv::findHomography(
cv::Mat(points1), // corresponding
cv::Mat(points2), // points
inliers, // outputted inliers matches
1.); // max distance to reprojection point

Recall that a homography will only exist if the two images are separated by a pure rotation, which is the case of the following two images:

OpenCV: Estimating Projective Relations in Images

OpenCV: Estimating Projective Relations in Images

The resulting inliers that comply with the found homography have been drawn on those images by the following loop:

// Draw the inlier points
std::vector<cv::Point2f>::const_iterator itPts=
std::vector<uchar>::const_iterator itIn= inliers.begin();
while (itPts!=points1.end()) {
// draw a circle at each inlier location
if (*itIn)
itPts= points2.begin();
itIn= inliers.begin();
while (itPts!=points2.end()) {
// draw a circle at each inlier location
if (*itIn)

As explained in the preceding section, once the homography is computed, you can transfer image points from one image to the other. In fact, you can do this for all pixels of an image and the result will be to transform this image to the other view. There is an OpenCV function that does exactly this:

// Warp image 1 to image 2
cv::Mat result;
cv::warpPerspective(image1, // input image
result, // output image
homography, // homography
image1.rows)); // size of output image

Once this new image is obtained, it can be appended to the other image in order to expand the view (since the two images are now from the same point of view):

// Copy image 1 on the first half of full image
cv::Mat half(result,cv::Rect(0,0,image2.cols,image2.rows));
image2.copyTo(half); // copy image2 to image1 roi

The result is the following image:

(Move the mouse over the image to enlarge it.)

How it works...

When two views are related by a homography, it becomes possible to determine where a given scene point on one image is found on the other image. This property becomes particularly interesting for points that fall outside the image boundaries. Indeed, since the second view shows a portion of the scene that is not visible in the first image, you can use the homography in order to expand the image by reading in the other image the color value of additional pixels. That is how we were able to create a new image that is an expansion of our second image in which extra columns were added to the right.

The homography computed by cv::findHomography is the one that maps points in the first image to points in the second image. What we need in order to transfer the points of image 1 to image 2 is in fact the inverse homography. This is exactly what the function cv::warpPerspective is doing by default, that is, it uses the inverse of the homography provided as input to get the color value of each point of the output image. When an output pixel is transferred to a point outside the input image, a black value (0) is simply assigned to this pixel. Note that an optional flag cv::WARP_INVERSE_MAP can be specified as an optional fifth argument in cv::warpPerspective if one wants to use the direct homography instead of the inverted one during the pixel transfer process.

There's more...

A homography also exists between two views of a plane. This can be demonstrated by looking again at the camera projection equation as we did for the pure rotation case. When a plane is observed, we can, without loss of generality, set the reference frame of the plane such that all of its points have a Z coordinate which equals to 0. This will also cancel one of the columns of the 3x4 projection matrix resulting in a 3x3 matrix: a homography. This means that if, for example, you have several pictures from different points of view of the flat façade of a building, you can compute the homography between these images and build a large mosaic of the façade by wrapping the images and assembling them together as we did in this recipe.

A minimum of four matched points between two views are required to compute a homography. The function cv::getPerspectiveTransform allows such a transformation from four corresponding points to be computed.


This article analyzed the different relations involved in image formation. It also explored the projective relations that exist between two images of a same scene.

Further resources related to this subject:

Books to Consider

comments powered by Disqus

An Introduction to 3D Printing

Explore the future of manufacturing and design  - read our guide to 3d printing for free