Segmenting images in OpenCV


OpenCV 2 Computer Vision Application Programming Cookbook

OpenCV 2 Computer Vision Application Programming Cookbook

Over 50 recipes to master this library of programming functions for real-time computer vision

        Read more about this book      

OpenCV (Open Source Computer Vision) is an open source library containing more than 500 optimized algorithms for image and video analysis. Since its introduction in 1999, it has been largely adopted as the primary development tool by the community of researchers and developers in computer vision. OpenCV was originally developed at Intel by a team led by Gary Bradski as an initiative to advance research in vision and promote the development of rich, vision-based CPU-intensive applications.

In the previous article by Robert Laganière, author of OpenCV 2 Computer Vision Application Programming Cookbook, we took a look at image processing using morphological filters.

In this article we will see how to segment images using watersheds and GrabCut algorithm.

(For more resources related to this subject, see here.)

Segmenting images using watersheds

The watershed transformation is a popular image processing algorithm that is used to quickly segment an image into homogenous regions. It relies on the idea that when the image is seen as a topological relief, homogeneous regions correspond to relatively flat basins delimitated by steep edges. As a result of its simplicity, the original version of this algorithm tends to over-segment the image which produces multiple small regions. This is why OpenCV proposes a variant of this algorithm that uses a set of predefined markers which guide the definition of the image segments.

How to do it...

The watershed segmentation is obtained through the use of the cv::watershed function. The input to this function is a 32-bit signed integer marker image in which each non-zero pixel represents a label. The idea is to mark some pixels of the image that are known to certainly belong to a given region. From this initial labeling, the watershed algorithm will determine the regions to which the other pixels belong. In this recipe, we will first create the marker image as a gray-level image, and then convert it into an image of integers. We conveniently encapsulated this step into a WatershedSegmenter class:

class WatershedSegmenter {
cv::Mat markers;
void setMarkers(const cv::Mat& markerImage) {
// Convert to image of ints
cv::Mat process(const cv::Mat &image) {
// Apply watershed
return markers;

The way these markers are obtained depends on the application. For example, some preprocessing steps might have resulted in the identification of some pixels belonging to an object of interest. The watershed would then be used to delimitate the complete object from that initial detection. In this recipe, we will simply use the binary image used in the previous article (OpenCV: Image Processing using Morphological Filters) in order to identify the animals of the corresponding original image.

Therefore, from our binary image, we need to identify pixels that certainly belong to the foreground (the animals) and pixels that certainly belong to the background (mainly the grass). Here, we will mark foreground pixels with label 255 and background pixels with label 128 (this choice is totally arbitrary, any label number other than 255 would work). The other pixels, that is the ones for which the labeling is unknown, are assigned value 0. As it is now, the binary image includes too many white pixels belonging to various parts of the image. We will then severely erode this image in order to retain only pixels belonging to the important objects:

// Eliminate noise and smaller objects
cv::Mat fg;

The result is the following image:

OpenCV: Segmenting Images

Note that a few pixels belonging to the background forest are still present. Let's simply keep them. Therefore, they will be considered to correspond to an object of interest. Similarly, we also select a few pixels of the background by a large dilation of the original binary image:

// Identify image pixels without objects
cv::Mat bg;

The resulting black pixels correspond to background pixels. This is why the thresholding operation immediately after the dilation assigns to these pixels the value 128. The following image is then obtained:

OpenCV: Segmenting Images

These images are combined to form the marker image:

// Create markers image
cv::Mat markers(binary.size(),CV_8U,cv::Scalar(0));
markers= fg+bg;

Note how we used the overloaded operator+ here in order to combine the images. This is the image that will be used as input to the watershed algorithm:

OpenCV: Segmenting Images

The segmentation is then obtained as follows:

// Create watershed segmentation object
WatershedSegmenter segmenter;
// Set markers and process

The marker image is then updated such that each zero pixel is assigned one of the input labels, while the pixels belonging to the found boundaries have value -1. The resulting image of labels is then:

OpenCV: Segmenting Images

The boundary image is:

OpenCV: Segmenting Images

How it works...

As we did in the preceding recipe, we will use the topological map analogy in the description of the watershed algorithm. In order to create a watershed segmentation, the idea is to progressively flood the image starting at level 0. As the level of "water" progressively increases (to levels 1, 2, 3, and so on), catchment basins are formed. The size of these basins also gradually increase and, consequently, the water of two different basins will eventually merge. When this happens, a watershed is created in order to keep the two basins separated. Once the level of water has reached its maximal level, the sets of these created basins and watersheds form the watershed segmentation.

As one can expect, the flooding process initially creates many small individual basins. When all of these are merged, many watershed lines are created which results in an over-segmented image. To overcome this problem, a modification to this algorithm has been proposed in which the flooding process starts from a predefined set of marked pixels. The basins created from these markers are labeled in accordance with the values assigned to the initial marks. When two basins having the same label merge, no watersheds are created, thus preventing the oversegmentation.

This is what happens when the cv::watershed function is called. The input marker image is updated to produce the final watershed segmentation. Users can input a marker image with any number of labels with pixels of unknown labeling left to value 0. The marker image has been chosen to be an image of a 32-bit signed integer in order to be able to define more than 255 labels. It also allows the special value -1, to be assigned to pixels associated with a watershed. This is what is returned by the cv::watershed function. To facilitate the displaying of the result, we have introduced two special methods. The first one returns an image of the labels (with watersheds at value 0). This is easily done through thresholding:

// Return result in the form of an image
cv::Mat getSegmentation() {
cv::Mat tmp;
// all segment with label higher than 255
// will be assigned value 255
return tmp;

Similarly, the second method returns an image in which the watershed lines are assigned value 0, and the rest of the image is at 255. This time, the cv::convertTo method is used to achieve this result:

// Return watershed in the form of an image
cv::Mat getWatersheds() {
cv::Mat tmp;
// Each pixel p is transformed into
// 255p+255 before conversion
return tmp;

The linear transformation that is applied before the conversion allows -1 pixels to be converted into 0 (since -1*255+255=0).

Pixels with a value greater than 255 are assigned the value 255. This is due to the saturation operation that is applied when signed integers are converted into unsigned chars.

See also

The article The viscous watershed transform by C. Vachier, F. Meyer, Journal of Mathematical Imaging and Vision, volume 22, issue 2-3, May 2005, for more information on the watershed transform.

The next recipe which presents another image segmentation algorithm that can also segment an image into background and foreground objects.

        Read more about this book      

(For more resources related to this subject, see here.)

Extracting foreground objects with the GrabCut algorithm

OpenCV proposes an implementation of another popular algorithm for image segmentation: the GrabCut algorithm. This algorithm is not based on mathematical morphology, but we present it here since it shows some similarities in its use with the watershed segmentation algorithm presented in the preceding recipe. GrabCut is computationally more expensive than watershed, but it generally produces a more accurate result. It is the best algorithm to use when one wants to extract a foreground object in a still image (for example, to cut and paste an object from one picture to another).

How to do it...

The cv::grabCut function is easy to use. You just need to input an image and label some of its pixels as belonging to the background or to the foreground. Based on this partial labeling, the algorithm will then determine a foreground/background segmentation for the complete image.

One way of specifying a partial foreground/background labeling for an input image is by defining a rectangle inside which the foreground object is included:

// Open image
image= cv::imread("../group.jpg");
// define bounding rectangle
// the pixels outside this rectangle
// will be labeled as background
cv::Rect rectangle(10,100,380,180);

All pixels outside of this rectangle will then be marked as background. In addition to the input image and its segmentation image, calling the cv::grabCut function requires the definition of two matrices which will contain the models built by the algorithm:

cv::Mat result; // segmentation (4 possible values)
cv::Mat bgModel,fgModel; // the models (internally used)
// GrabCut segmentation
cv::grabCut(image, // input image
result, // segmentation result
rectangle, // rectangle containing foreground
bgModel,fgModel, // models
5, // number of iterations
cv::GC_INIT_WITH_RECT); // use rectangle

Note how we specified that we are using the bounding rectangle mode using the cv::GC_ INIT_WITH_RECT flag as the last argument of the function (the next section will discuss the other available mode). The input/output segmentation image can have one of the four values:

  • cv::GC_BGD, for pixels certainly belonging to the background (for example, pixels outside the rectangle in our example)
  • cv::GC_FGD, for pixels certainly belonging to the foreground (none in our example)
  • cv::GC_PR_BGD, for pixels probably belonging to the background
  • cv::GC_PR_FGD for pixels probably belonging to the foreground (that is the initial value for the pixels inside the rectangle in our example).

We get a binary image of the segmentation by extracting the pixels having a value equal to cv::GC_PR_FGD:

// Get the pixels marked as likely foreground
// Generate output image
cv::Mat foreground(image.size(),CV_8UC3,
image.copyTo(foreground,// bg pixels are not copied

To extract all foreground pixels, that is, with values equal to cv::GC_PR_FGD or cv::GC_ FGD, it is possible to simply check the value of the first bit:

// checking first bit with bitwise-and
result= result&1; // will be 1 if FG

This is possible because these constants are defined as values 1 and 3, while the other two are defined as 0 and 2. In our example, the same result is obtained because the segmentation image does not contain cv::GC_FGD pixels (only cv::GC_BGD pixels have been inputted).

Finally, we obtain an image of the foreground objects (over a white background) by the following copy operation with mask:

// Generate output image
cv::Mat foreground(image.size(),CV_8UC3,
cv::Scalar(255,255,255)); // all white image
image.copyTo(foreground,result); // bg pixels not copied

The resulting image is then:

OpenCV: Segmenting Images

How it works...

In the preceding example, the GrabCut algorithm was able to extract the foreground objects by simply specifying a rectangle inside which these objects (the four animals) were contained. Alternatively, one could also assign values cv::GC_BGD and cv::GC_FGD to some specific pixels of the segmentation image provided as the second argument of the cv::grabCut function. You would then specify GC_INIT_WITH_MASK as the input mode flag. These input labels could be obtained, for example, by asking a user to interactively mark a few elements of the image. It is also possible to combine these two input modes.

Using this input information, the GrabCut creates the background/foreground segmentation by proceeding as follows. Initially, a foreground label (cv::GC_PR_FGD) is tentatively assigned to all unmarked pixels. Based on the current classification, the algorithm groups the pixels into clusters of similar colors (that is K clusters for the background and K clusters for the foreground). The next step is to determine a background/foreground segmentation by introducing boundaries between foreground and background pixels. This is done through an optimization process that tries to connect pixels with similar labels, and that imposes a penalty for placing a boundary in regions of relatively uniform intensity. This optimization problem is efficiently solved using the Graph Cuts algorithm, a method that can find the optimal solution of a problem by representing it as a connected graph on which cuts are applied in order to compose an optimal configuration. The obtained segmentation produces new labels for the pixels. The clustering process can then be repeated and a new optimal segmentation is found again, and so on. Therefore, the GrabCut is an iterative procedure which gradually improves the segmentation result. Depending on the complexity of the scene, a good solution can be found in more or less iterations (in easy cases, one iteration can be enough!).

This explains the previous last argument of the function where the user can specify the number of iterations to apply. The two internal models maintained by the algorithm are passed as argument of the function (and returned) such that it is possible to call the function with the models of the last run again if one wishes to improve the segmentation result by performing additional iterations.

See also

The article by C. Rother, V. Kolmogorov and A. Blake, GrabCut: Interactive Foreground Extraction using Iterated Graph Cuts in ACM Transactions on Graphics (SIGGRAPH) volume 23, issue 3, August 2004, that describes in detail the GrabCut algorithm.


In this article we saw how to segment images using watersheds and GrabCut algorithm.

Further resources on this subject:

You've been reading an excerpt of:

OpenCV 2 Computer Vision Application Programming Cookbook

Explore Title
comments powered by Disqus