Designing a Simple, Robust Object Detector and Classifier

In this article by Joseph Howse, author of the book, iOS Application Development with OpenCV 3, illustrates a scale-invariant,rotation-invariant approach to object detection and classification, using OpenCV 3and just 250 lines of custom C++ code. The technique relies on blob detection, histogram analysis, and SURF (or ORB if SURF is unavailable).The classifier is sensitive to colors as well as keypoints, and itcan work with a small number of training images.

For background information, sample images, and a complete tutorial on how to integrate this detector and classifier into an iOS application, refer toChapter 5, Classifying Coins and Commodities in the book,iOS Application Development with OpenCV 3 (Packt Publishing, 2016). You could also use this article's C++ code on other platforms besides iOS.

(For more resources related to this topic, see here.)

Defining blobs and a blob detector

For our purposes, a blob simply has an image and a label. The image is cv::Mat and the label is an unsigned integer. The label's default value is 0, which shall signify that the blob has not yet been classified. Create a new header file, Blob.h, and fill it with the following declaration of a Blob class:

#ifndef BLOB_H

#define BLOB_H


#include <opencv2/core.hpp>


class Blob



  Blob(const cv::Mat &mat, uint32_t label = 0ul);



   * Construct an empty blob.





   * Construct a blob by copying another blob.


  Blob(const Blob &other);


  bool isEmpty() const;


  uint32_t getLabel() const;

  void setLabel(uint32_t value);


  const cv::Mat &getMat() const;

  int getWidth() const;

  int getHeight() const;



  uint32_t label;


  cv::Mat mat;



#endif // BLOB_H

A Blob's image does not change after construction, but the label may change as a result of our classification process. Note that most of Blob's methods have the const modifier, but of course,setLabel does not because it changes the label.

Now, let's declare a BlobDetector class in another new header file, BlobDetector.h. This class provides a detect public method to analyze a given image and populate vector<Blob> based on detected objects in the image. Another public method, getMask, returns a thresholded version of the most recent image that the detect method received. Internally, BlobDetector uses several more matrices and vectors to hold intermediate results, including the mask, detected edges, detected contours, and hierarchy that describes the contours' relationship to each other. Here is the detector's declaration:

class BlobDetector



  void detect(cv::Mat &image, std::vector<Blob>&blob,

    double resizeFactor = 1.0, bool draw = false);


  const cv::Mat &getMask() const;



  void createMask(const cv::Mat &image);


  cv::Mat resizedImage;

  cv::Mat mask;

  cv::Mat edges;

  std::vector<std::vector<cv::Point>> contours;

  std::vector<cv::Vec4i> hierarchy;



#endif // !BLOB_DETECTOR_H

Later, in the Detecting blobs against a plain background section, we will define the methods' bodies in new files called Blob.cpp and BlobDetector.cpp.

Defining blob descriptors and a blob classifier

If you are familiar with keypoint matching, you know that a keypoint has a descriptor or set of descriptive statistics. Similarly, we can define a custom descriptor for a blob. As our classifier relies on histogram comparison and keypoint matching, let's say that a blob's descriptor consists of a normalized histogram and matrix of keypoint descriptors. The descriptor object is also a convenient place to put the label. Create a new header file, BlobDescriptor.h, and put the following declaration of a BlobDescriptor class in it:




#include <opencv2/core.hpp>


class BlobDescriptor



  BlobDescriptor(const cv::Mat &normalizedHistogram,

    const cv::Mat &keypointDescriptors, uint32_t label);


  const cv::Mat &getNormalizedHistogram() const;

  const cv::Mat &getKeypointDescriptors() const;

  uint32_t getLabel() const;



  cv::Mat normalizedHistogram;

  cv::Mat keypointDescriptors;

  uint32_t label;




Note that BlobDescriptor is designed as an immutable class. All its methods, except the constructor, have the const modifier.

Now, let's declare a BlobClassifier class in another new header file, BlobClassifier.h. Publicly, this class receives Blob objects via an update method (for reference blobs) and a classify method (for blobs that the detector found in the scene). Privately, BlobClassifier creates, owns, and compares BlobDescriptor objects that pertain to the Blob objects. Thus, BlobClassifier is the only part of our program that needs to deal with BlobDescriptor. BlobClassifier also owns instances of OpenCV classes that are responsible for keypoint detection, description, and matching. Here is our classifier's declaration:




#import "Blob.h"

#import "BlobDescriptor.h"


#include <opencv2/features2d.hpp>


class BlobClassifier






   * Add a reference blob to the classification model.


  void update(const Blob &referenceBlob);



   * Clear the classification model.


  void clear();



   * Classify a blob that was detected in a scene.


  void classify(Blob &detectedBlob) const;



  BlobDescriptor createBlobDescriptor(const Blob &blob) const;

  float findDistance(const BlobDescriptor &detectedBlobDescriptor,

    const BlobDescriptor &referenceBlobDescriptor) const;



   * A feature detector and descriptor extractor.

   * It finds features in images.

   * Then, it creates descriptors of the features.


  cv::Ptr<cv::Feature2D> featureDetectorAndDescriptorExtractor;



   * A descriptor matcher.

   * It matches features based on their descriptors.


  cv::Ptr<cv::DescriptorMatcher> descriptorMatcher;



   * Descriptors of the reference blobs.


  std::vector<BlobDescriptor> referenceBlobDescriptors;




Later, in the Classifying blobs by color and keypoints section, we will write the methods' bodies in new files called BlobDescriptor.cpp and BlobClassifier.cpp.

Detecting blobs against a plain background

Let's assume that the background has a distinctive color range, such as "cream to snow white". Our blob detector will calculate the image's dominant color range and search for large regions whose colors differ from this range. These anomalous regions will constitute the detected blobs.

For small objects such as a bean or coin, a user can easily find a plain background such as a blank sheet of paper, plain table-top, plain piece of clothing, or even the palm of a hand. As our blob detector dynamically estimates the background color range, it can cope with various backgrounds and lighting conditions; it is not limited to a lab environment.

Create a new file, BlobDetector.cpp, for the implementation of our BlobDetector class. (To review the header, refer back to the Defining blobs and a blob detector section.) At the top of BlobDetector.cpp, we will define several constants that pertain to the breadth of the background color range, the size and smoothing of the blobs, and the color of the blobs' rectangles in the preview image. Here is the relevant code:

#include <opencv2/imgproc.hpp>


#include "BlobDetector.h"


const double MASK_STD_DEVS_FROM_MEAN = 1.0;




const double BLOB_RELATIVE_MIN_SIZE_IN_IMAGE = 0.05;


const cv::Scalar DRAW_RECT_COLOR(0, 255, 0); // Green

Of course, the heart of BlobDetector is its detect method. Optionally, the method creates a downsized version of the image for faster processing. Then, we call a helper method, createMask, to perform thresholding and erosion on the (resized) image. We pass the resulting mask to the cv::Canny function to perform Canny edge detection. We pass the edge mask to the cv::findContours function, which populates a vector of contours, in the vector<vector<cv::Point>> format. That is to say, each contour is a vector of points. For each contour, we find the points' bounding rectangle. If we are working with a resized image, we restore the bounding rectangle to the original scale. We reject rectangles that are very small. Finally, for each accepted rectangle, we put a new Blob object in the output vector and optionally draw the rectangle in the original image. Here is the detect method's implementation:

void BlobDetector::detect(cv::Mat &image,

  std::vector<Blob>&blobs, double resizeFactor, bool draw)




  if (resizeFactor == 1.0) {


  } else {

    cv::resize(image, resizedImage, cv::Size(), resizeFactor,

      resizeFactor, cv::INTER_AREA);




  // Find the edges in the mask.

  cv::Canny(mask, edges, 191, 255);


  // Find the contours of the edges.

  cv::findContours(edges, contours, hierarchy, cv::RETR_TREE,



  std::vector<cv::Rect> rects;

  int blobMinSize = (int)(MIN(image.rows, image.cols) *


  for (std::vector<cv::Point> contour : contours) {


    // Find the contour's bounding rectangle.

    cv::Rect rect = cv::boundingRect(contour);


    // Restore the bounding rectangle to the original scale.

    rect.x /= resizeFactor;

    rect.y /= resizeFactor;

    rect.width /= resizeFactor;

    rect.height /= resizeFactor;


    if (rect.width < blobMinSize || rect.height < blobMinSize) {




    // Create the blob from the sub-image inside the bounding

    // rectangle.

    blobs.push_back(Blob(cv::Mat(image, rect)));


    // Remember the bounding rectangle in order to draw it later.




  if (draw) {

    // Draw the bounding rectangles.

    for (const cv::Rect &rect : rects) {

      cv::rectangle(image,,, DRAW_RECT_COLOR);




The getMask method simply returns the mask that we previously computed in the detect method:

const cv::Mat &BlobDetector::getMask() const {

  return mask;


The createMask helper method begins by finding the image's mean color and standard deviation using the cv::meanStdDev function. We calculate a range of background colors based on a certain number of standard deviations from the mean, as defined by the MASK_STD_DEVS_FROM_MEAN constant near the top of BlobDetector.cpp. We deem values outside this range to be foreground colors. Using the cv::inRange function, we map the background colors (in the image) to white (in the mask) and the foreground colors (in the image) to black (in the mask). Then, we create a square kernel using the cv::getStructuringElement function. Finally, we use the kernel in the cv::erode function to apply the erosion morphological operation to the mask. This has the effect of smoothing the black (foreground) regions such that they swallow up little gaps that are probably just noise. Here is the relevant code:

void BlobDetector::createMask(const cv::Mat &image) {


  // Find the image's mean color.

  // Presumably, this is the background color.

  // Also find the standard deviation.

  cv::Scalar meanColor;

  cv::Scalar stdDevColor;

  cv::meanStdDev(image, meanColor, stdDevColor);


  // Create a mask based on a range around the mean color.

  cv::Scalar halfRange = MASK_STD_DEVS_FROM_MEAN * stdDevColor;

  cv::Scalar lowerBound = meanColor - halfRange;

  cv::Scalar upperBound = meanColor + halfRange;

  cv::inRange(image, lowerBound, upperBound, mask);


  // Erode the mask to merge neighboring blobs.

  int kernelWidth = (int)(MIN(image.cols, image.rows) *


  if (kernelWidth > 0) {

    cv::Size kernelSize(kernelWidth, kernelWidth);

    cv::Mat kernel = cv::getStructuringElement(cv::MORPH_RECT,


    cv::erode(mask, mask, kernel, cv::Point(-1, -1),




That is the end of the blob detector's code. As you can see, it uses a general-purpose and rather linear approach, without any special cases for different kinds of objects.Moreover, we are using a separate blob detector and blob classifier, and this separation of responsibilities enables us to keep each class's implementation relatively simple.

For completeness, note that the Blob class's constructors have straightforward implementations that copy the arguments. For the blob's image, we make a deep copy because the original may change. (For example, the original may be a subimage in a frame of video, and after detection we may draw rectangles atop the frame of video.) Similarly, Blob's getter and setter methods are self-explanatory. Create a new file, Blop.cpp, and fill it with the following implementation:

#import "Blob.h"


Blob::Blob(const cv::Mat &mat, uint32_t label)

: label(label)





Blob::Blob() {



Blob::Blob(const Blob &other)

: label(other.label)





bool Blob::isEmpty() const {

  return mat.empty();


uint32_t Blob::getLabel() const {

  return label;


void Blob::setLabel(uint32_t value) {

  label = value;


const cv::Mat &Blob::getMat() const {

  return mat;


int Blob::getWidth() const {

  return mat.cols;


int Blob::getHeight() const {

  return mat.rows;


Classifying blobs by color and keypoints

Our classifier operates on the assumption that a blob contains distinctive colors, distinctive keypoints, or both. To conserve memory and precompute as much relevant information as possible, we do not store images of the reference blobs, but instead we store histograms and keypoint descriptors.

Create a new file, BlobClassifier.cpp, for the implementation of our BlobClassifier class. (To review the header, refer back to the Defining blob descriptors and a blob classifier section.) At the top of BlobDetector.cpp, we will define several constants that pertain to the number of histogram bins, the histogram comparison method, and the relative importance of the histogram comparison versus the keypoint comparison. Here is the relevant code:

#include <opencv2/imgproc.hpp>


#include "BlobClassifier.h"



#include <opencv2/xfeatures2d.hpp>






const float HISTOGRAM_DISTANCE_WEIGHT = 0.98f;



Beware that the HISTOGRAM_NUM_BINS_PER_CHANNEL constant has a cubic relationship to memory usage. For each blob descriptor, we store a three-dimensional (BGR) histogram with HISTOGRAM_NUM_BINS_PER_CHANNEL^3 elements, and each element is a 32-bit floating point number. If the constant is 32, each histogram's size in megabytes is (32^3)*32/(10^6)=1.0. This is fine for a small set of reference descriptors. If the constant is 256 (the maximum number of bins for an 8-bit color channel), the histogram's size goes up to a whopping value of (256^3)*32/(10^6)=536.9 megabytes! For an iOS application, this is unacceptable, given the platform's memory constraints.

At best, in a high-end iOS device, one gigabyte of RAM might be available to each application. Conservatively, you should worry if your app's memory usage approaches 100 megabytes.

Remember that OpenCV's SURF implementation is in the xfeatures2d module, which is part of opencv_contrib. If opencv_contrib is available, let's define the WITH_OPENCV_CONTRIB preprocessor flag. Then, our code imports the <opencv/xfeatures2d.hpp> header, and we use SURF. Otherwise, we use ORB. This selection also affects the implementation of BlobClassifier's constructor. OpenCV provides factory methods for various feature detectors, descriptors, and matchers, so we simply have to use the right combination of factory methods for SURF with Flann matching or ORB with brute-force matching based on the Hamming distance. Here is the constructor's implementation:

BlobClassifier::BlobClassifier() {


  featureDetectorAndDescriptorExtractor =


  descriptorMatcher = cv::DescriptorMatcher::create("FlannBased");


  featureDetectorAndDescriptorExtractor = cv::ORB::create();

  descriptorMatcher = cv::DescriptorMatcher::create(




The update method's implementation calls a helper method, createBlobDescriptor, and adds the resulting BlobDescriptor to a vector of reference descriptors:

void BlobClassifier::update(const Blob &referenceBlob) {




The clear method's implementation discards all the reference descriptors such that the BlobClassifier reverts to its initial, untrained state:

void BlobClassifier::clear() {



The implementation of the classify method relies on another helper method, findDistance. For each reference descriptor, classify calls findDistance to obtain a measure of dissimilarity between the query blob's descriptor and reference descriptor. We find the reference descriptor with the least distance (best similarity) and return its label as the classification result. If there are no reference descriptors, classify returns 0, the "unknown" label. Here is classify's implementation:

void BlobClassifier::classify(Blob &detectedBlob) const {

  BlobDescriptor detectedBlobDescriptor =


  float bestDistance = FLT_MAX;

  uint32_t bestLabel = 0;

  for (const BlobDescriptor &referenceBlobDescriptor :

      referenceBlobDescriptors) {

    float distance = findDistance(detectedBlobDescriptor,


    if (distance < bestDistance) {

      bestDistance = distance;

      bestLabel = referenceBlobDescriptor.getLabel();





The createBlobDescriptor helper method is responsible for calculating a normalized histogram of Bloband keypoint descriptors and using them to build a new BlobDescriptor. To calculate the (non-normalized) histogram, we use the cv::calcHist function. Among its arguments, it requires three arrays to specify the channels we want to use, the number of bins per channel, and the range of each channel's values. To normalize the resulting histogram, we divide by the number of pixels in the blob's image. The following code, pertaining to the histogram, is the first half of implementation of createBlobDescriptor:

BlobDescriptor BlobClassifier::createBlobDescriptor(

  const Blob &blob) const


  const cv::Mat &mat = blob.getMat();

  int numChannels = mat.channels();


  // Calculate the histogram of the blob's image.

  cv::Mat histogram;

  int channels[] = { 0, 1, 2 };




  float range[] = { 0.0f, 256.0f };

  const float *ranges[] = { range, range, range };

  cv::calcHist(&mat, 1, channels, cv::Mat(), histogram, 3,

    numBins, ranges);


  // Normalize the histogram.

  histogram *= (1.0f / (mat.rows * mat.cols));

Now, we must convert the blob's image to grayscale and obtain keypoints and keypoint descriptors using the detect and compute methods of cv::Feature2D. With the normalized histogram and keypoint descriptors, we have everything that we need to construct and return a new BlobDescriptor. Here is the remainder of implementation of createBlobDescriptor:

 // Convert the blob's image to grayscale.

  cv::Mat grayMat;

  switch (numChannels) {

    case 4:

      cv::cvtColor(mat, grayMat, cv::COLOR_BGRA2GRAY);



      cv::cvtColor(mat, grayMat, cv::COLOR_BGR2GRAY);




  // Detect features in the grayscale image.

  std::vector<cv::KeyPoint> keypoints;




  // Extract descriptors of the features.

  cv::Mat keypointDescriptors;


    keypoints, keypointDescriptors);


  return BlobDescriptor(histogram, keypointDescriptors,



The findDistance helper method performs histogram comparison using the cv::compareHist function and keypoint matching using the match method of cv::DescriptorMatcher. Each of the resulting keypoint matches has a distance, and we sum these distances. Then, as an overall measure of distance between the two blob descriptors, we return a weighted average of the histogram distance and the total keypoint matching distance. Here is the relevant code:

float BlobClassifier::findDistance(

  const BlobDescriptor &detectedBlobDescriptor,

  const BlobDescriptor &referenceBlobDescriptor) const


  // Calculate the histogram distance.

  float histogramDistance = (float)cv::compareHist(





  // Calculate the keypoint matching distance.

  float keypointMatchingDistance = 0.0f;

  std::vector<cv::DMatch> keypointMatches;





  for (const cv::DMatch &keypointMatch : keypointMatches) {

    keypointMatchingDistance += keypointMatch.distance;



  return histogramDistance * HISTOGRAM_DISTANCE_WEIGHT +

    keypointMatchingDistance * KEYPOINT_MATCHING_DISTANCE_WEIGHT;


That is the end of the blob classifier's code. Again, we see that a single class can provide useful, general-purpose computer vision functionality without a terribly complicated implementation. Perhaps this is a Zen moment; our previous work and studieshave been a path to (some kind of) simplicity! Of course, OpenCV hides a lot of complexity for us in its implementations of histogram-related functions and keypoint-related classes, and in this way, the library offers us a relatively gentle path.

For completeness, note that the BlobDescriptor class has a straightforward implementation. Create a new file, BlobDescriptor.cpp, and fill it with the following bodies for a constructor and getters:

#include "BlobDescriptor.h"


BlobDescriptor::BlobDescriptor(const cv::Mat &normalizedHistogram, const cv::Mat &keypointDescriptors, uint32_t label)

: normalizedHistogram(normalizedHistogram)

, keypointDescriptors(keypointDescriptors)

, label(label)




const cv::Mat &BlobDescriptor::getNormalizedHistogram() const {

  return normalizedHistogram;


const cv::Mat &BlobDescriptor::getKeypointDescriptors() const {

  return keypointDescriptors;


uint32_t BlobDescriptor::getLabel() const {

  return label;



Now, we have finished all the code for the detector, descriptor, and classifier! Again, for more information, refer to Chapter 5, Classifying Coins and Commodities in the book,iOS Application Development with OpenCV 3.

Resources for Article:

Further resources on this subject:

You've been reading an excerpt of:

iOS Application Development with OpenCV 3

Explore Title
comments powered by Disqus