Tracking Faces with Haar Cascades

Learn to capture videos, manipulate images, and track objects with Python using the OpenCV Library

(For more resources related to this topic, see here.)

Conceptualizing Haar cascades

When we talk about classifying objects and tracking their location, what exactly are we hoping to pinpoint? What constitutes a recognizable part of an object?

Photographic images, even from a webcam, may contain a lot of detail for our (human) viewing pleasure. However, image detail tends to be unstable with respect to variations in lighting, viewing angle, viewing distance, camera shake, and digital noise. Moreover, even real differences in physical detail might not interest us for the purpose of classification. I was taught in school, that no two snowflakes look alike under a microscope. Fortunately, as a Canadian child, I had already learned how to recognize snowflakes without a microscope, as the similarities are more obvious in bulk.

Thus, some means of abstracting image detail is useful in producing stable classification and tracking results. The abstractions are called features , which are said to be extracted from the image data. There should be far fewer features than pixels, though any pixel might influence multiple features. The level of similarity between two images can be evaluated based on distances between the images' corresponding features. For example, distance might be defined in terms of spatial coordinates or color coordinates. Haar-like features are one type of feature that is often applied to real-time face tracking. They were first used f or this purpose by Paul Viola and Michael Jones in 2001. Each Haar-like feature describes the pattern of contrast among adjacent image regions. For example, edges, vertices, and thin lines each generate distinctive features. For any given image, the features may vary depending on the regions' size, which may be called the window size. Two images that differ only in scale should be capable of yielding similar features, albeit for different window sizes. Thus, it is useful to generate features for multiple window sizes. Such a collection of features is called a cascade. We may say a Haar cascade is scale-invariant or, in other words, robust to changes in scale. OpenCV provides a classifier and tracker for scale-invariant Haar cascades, whic h it expects to be in a certain file format. Haar cascades, as implemented in OpenCV, are not robust to changes in rotation. For example, an upside-down face is not considered similar to an upright face and a face viewed in profile is not considered similar to a face viewed from the front. A more complex and more resource-intensive implementation could improve Haar cascades' robustness to rotation by considering multiple transformations of images as well as multiple window sizes. However, we will confine ourselves to the implementation in OpenCV.

Getting Haar cascade data

As part of your OpenCV setup, you probably have a directory called haarcascades. It contains cascades that are trained for certain subjects using tools that come with OpenCV. The directory's full path depends on your system and method of setting up OpenCV, as follows:

  • Build from source archive:: <unzip_destination>/data/haarcascades

  • Windows with self-extracting ZIP:<unzip_destination>/data/haarcascades

  • Mac with MacPorts:MacPorts: /opt/local/share/OpenCV/haarcascades

  • Mac with Homebrew:The haarcascades file is not included; to get it, download the source archive

  • Ubuntu with apt or Software Center: The haarcascades file is not included; to get it, download the source archive

If you cannot find haarcascades, then download the source archive from (or the Windows self-extracting ZIP from 2.4.3/OpenCV-2.4.3.exe/download), unzip it, and look for <unzip_destination>/data/haarcascades.

Once you find haarcascades, create a directory called cascades in the same folder as and copy the following files from haarcascades into cascades:


As their names suggest, these cascades are for tracking faces, eyes, noses, and mouths. They require a frontal, upright view of the subject. With a lot of patience and a powerful computer, you can make your own cascades, trained for various types of objects.

Creating modules

We should continue to maintain good separation between application-specific code and reusable code. Let's make new modules for tracking classes and their helpers.

A file called should be created in the same directory as (and, equivalently, in the parent directory of cascades ). Let's put the following import statements at the start of

import cv2
import rects
import utils

Alongside and, let's make another file called containing the following import statement:

import cv2

Our face tracker and a definition of a face will go in, while various helpers will go in and our preexisting file.

Defining a face as a hierarchy of rectangles

Before we start implementing a high-level tracker, we should define the type of tracking result that we want to get. For many applications, it is important to estimate how objects are posed in real, 3D space. However, our application is about image manipulation. So we care more about 2D image space. An upright, frontal view of a face should occupy a roughly rectangular region in the image. Within such a region, eyes, a nose, and a mouth should occupy rough rectangular subregions. Let's open and add a class containing the relevant data:

class Face(object):
"""Data on facial features: face, eyes, nose, mouth."""
def __init__(self):
self.faceRect = None
self.leftEyeRect = None
self.rightEyeRect = None
self.noseRect = None
self.mouthRect = None

Whenever our code contains a rectangle as a property or a function argument, we will assume it is in the format (x, y, w, h) where the unit is pixels, the upper-left corner is at (x, y), and the lower-right corner at (x+w, y+h). OpenCV sometimes uses a compatible representation but not always. So we must be careful when sending/receiving rectangles to/from OpenCV. For example, sometimes OpenCV requires the upper-left and lower-right corners as coordinate pairs.

Tracing, cutting, and pasting rectangles

When I was in primary school, I was poor at crafts. I often had to take my unfinished craft projects home, where my mother volunteered to finish them for me so that I could spend more time on the computer instead. I shall never cut and paste a sheet of paper, nor an array of bytes, without thinking of those days.

Just as in crafts, mistakes in our graphics program are easier to see if we first draw outlines. For debugging purposes, Cameo will include an option to draw lines around any rectangles represented by a Face. OpenCV provides a rectangle() function for drawing. However, its arguments represent a rectangle differently than Face does. For convenience, let's add the following wrapper of rectangle() to :

def outlineRect(image, rect, color):
if rect is None:
x, y, w, h = rect
cv2.rectangle(image, (x, y), (x+w, y+h), color)

Here, color should normally be either a BGR triplet (of values ranging from 0 to 255) or a grayscale value (ranging from 0 to 255), depending on the image's format.

Next, Cameo must support copying one rectangle's contents into another rectangle. We can read or write a rectangle within an image by using Python's slice notation. Remembering that an image's first index is the y coordinate or row, we can specify a rectangle as image[y:y+h, x:x+w]. For copying, a complication arises if the source and destination of rectangles are of different sizes. Certainly, we expect two faces to appear at different sizes, so we must address this case. OpenCV provides a resize() function that allows us to specify a destination size and an interpolation method. Combining slicing and resizing, we can add the following implementation of a copy function to

def copyRect(src, dst, srcRect, dstRect,
interpolation = cv2.INTER_LINEAR):
"""Copy part of the source to part of the destination.""" x0, y0, w0, h0 = srcRect x1, y1, w1, h1 = dstRect # Resize the contents of the source sub-rectangle. # Put the result in the destination sub-rectangle. dst[y1:y1+h1, x1:x1+w1] = \ cv2.resize(src[y0:y0+h0, x0:x0+w0], (w1, h1), interpolation = interpolation)

OpenCV supports the following options for interpolation:

  • cv2.INTER_NEAREST: This is nearest-neighbor interpolation, which is cheap but produces blocky results

  • cv2.INTER_LINEAR: This is bilinear interpolation (the default), which offers a good compromise between cost and quality in real-time applications

  • cv2.INTER_AREA: This is pixel area relation, which may offer a better compromise between cost and quality when downscaling but produces blocky results when upscaling

  • cv2.INTER_CUBIC: This is bicubic interpolation over a 4 x 4 pixel neighborhood, a high-cost, high-quality approach

  • cv2.INTER_LANCZOS4: This is Lanczos interpolation over an 8 x 8 pixel neighborhood, the highest-cost, highest-quality approach

Copying becomes more complicated if we want to support swapping of two or more rectangles' contents. Consider the following approach, which is wrong:

copyRect(image, image, rect0, rect1) # overwrite rect1
copyRect(image, image, rect1, rect0) # copy from rect1
# Oops! rect1 was already overwritten by the time we copied from it!

Instead, we need to copy one of the rectangles to a temporary array before overwriting anything. Let's edit to add the following function, which swaps the contents of two or more rectangles in a single source image:

def swapRects(src, dst, rects,
interpolation = cv2.INTER_LINEAR):
"""Copy the source with two or more sub-rectangles swapped."""
if dst is not src:
dst[:] = src
numRects = len(rects)
if numRects < 2: return # Copy the contents of the last rectangle into temporary storage. x, y, w, h = rects[numRects - 1] temp = src[y:y+h, x:x+w].copy() # Copy the contents of each rectangle into the next. i = numRects - 2 while i >= 0: copyRect(src, dst, rects[i], rects[i+1], interpolation) i -= 1 # Copy the temporarily stored content into the first rectangle. copyRect(temp, dst, (0, 0, w, h), rects[0], interpolation)

The swap is circular, such that it can support any number of rectangles. Each rectangle's content is destined for the next rectangle, except that the last rectangle's content is destined for the first rectangle.

This approach should serve us well enough for Cameo, but it is still not entirely foolproof. Intuition might tell us that the following code should leave image unchanged:

swapRects(image, image, rect0, rect1)
swapRects(image, image, rect1, rect0)

However, if rect0 and rect1 overlap, our intuition may be incorrect. If you see strange-looking results, then investigate the possibility that you are swapping overlapping rectangles.

Adding more utility functions

First, it may be useful to know whether an image is in grayscale or color. We can tell based on the dimensionality of the image. Color images are 3D arrays, while grayscale images have fewer dimensions. Let's add the following function to to test whether an image is in grayscale:

def isGray(image):
"""Return True if the image has one channel per pixel."""
return image.ndim < 3

Second, it may be useful to know an image's dimensions and to divide these dimensions by a given factor. An image's (or other array's) height and width, respectively, are the first two entries in its shape property. Let's add the following function to to get an image's dimensions, divided by a value:

def widthHeightDividedBy(image, divisor):
"""Return an image's dimensions, divided by a value."""
h, w = image.shape[:2]
return (w/divisor, h/divisor)

Now, let's get back on track with this article's main subject, tracking.

Tracking faces

The challenge in using OpenCV'sHaar cascade classifiers is not just getting a tracking result; it is getting a series of sensible tracking results at a high frame rate. One kind of common sense that we can enforce is that certain tracked objects should have a hierarchical relationship, one being located relative to the other. For example, a nose should be in the middle of a face. By attempting to track both a whole face and parts of a face, we can enable application code to do more detailed manipulations and to check how good a given tracking result is. A face with a nose is a better result than one without. At the same time, we can support some optimizations, such as only looking for faces of a certain size and noses in certain places.

We are going to implement an optimized, hierarchical tracker in a class called FaceTracker, which offers a simple interface. A FaceTracker may be initialized with certain optional configuration arguments that are relevant to the tradeoff between tracking accuracy and performance. At any given time, the latest tracking results of FaceTracker are stored in a property called faces, which is a list of Face instances. Initially, this list is empty. It is refreshed via an update() method that accepts an image for the tracker to analyze. Finally, for debugging purposes, the rectangles of faces may be drawn via a drawDebugRects() method, which accepts an image as a drawing surface. Every frame, a real-time face-tracking application would call update() , read faces, and perhaps call drawDebugRects().

Internally, FaceTracker uses an OpenCV class called CascadeClassifier. A CascadeClassifier is initialized with a cascade data file, such as the ones that we found and copied earlier. For our purposes, the important method of CascadeClassifier is detectMultiScale() , which performs tracking that may be robust to variations in scale. The possible arguments to detectMultiScale() are:

  • image: This is an image to be analyzed. It must have 8 bits per channel.

  • scaleFactor: This scaling factor separates the window sizes in two successive passes. A higher value improves performance but diminishes robustness with respect to variations in scale.

  • minNeighbors: This value is one less than the minimum number of regions that are required in a match. (A match may merge multiple neighboring regions.)

  • flags: There are several flags but not all combinations are val id. The valid standalone flags and valid combinations include:

    • Scales each windowed image region to match the feature data. (The default approach is the opposite: scale the feature data to match the window.) Scaling the image allows for certain optimizations on modern hardware. This flag must not be combined with others.

    • Eagerly rejects regions that contain too many or too few edges to match the object type. This flag should not be combined with OBJECT.

    • Accepts, at most, one match (the biggest).

    • | ROUGH SEARCH: Accepts, at most, one match (the biggest) and skips some steps that would refine (shrink) the region of this match. The minNeighbors argument should be greater than 0.

  • minSize: A pair of pixel dimensions representing the minimum object size being sought. A higher value improves performance.

  • maxSize: A pair of pixel dimensions representing the maximum object size being sought. A lower value improves performance.

The return value of detectMultiScale() is a list of matches, each expressed as a rectangle in the format [x, y, w, h].

Similarly, the initializer of FaceTracker accepts scaleFactor, minNeighbors, and flags as arguments. The given values are passed to all detectMultiScale() calls that a FaceTracker makes internally. Also during initialization, a FaceTracker creates CascadeClassifiers using face, eye, nose, and mouth data. Let's add the following implementation of the initializer and the faces property to

class FaceTracker(object):
"""A tracker for facial features: face, eyes, nose, mouth."""
def __init__(self, scaleFactor = 1.2, minNeighbors = 2,
flags = self.scaleFactor = scaleFactor self.minNeighbors = minNeighbors self.flags = flags self._faces = [] self._faceClassifier = cv2.CascadeClassifier( 'cascades/haarcascade_frontalface_alt.xml') self._eyeClassifier = cv2.CascadeClassifier( 'cascades/haarcascade_eye.xml') self._noseClassifier = cv2.CascadeClassifier( 'cascades/haarcascade_mcs_nose.xml') self._mouthClassifier = cv2.CascadeClassifier( 'cascades/haarcascade_mcs_mouth.xml') @property def faces(self): """The tracked facial features.""" return self._faces

The update() method of FaceTracker first creates an equalized, grayscale variant of the given image. Equalization, as implemented in OpenCV'sequalizeHist() function , normalizes an image's brightness and increases its contrast. Equalization as a preprocessing step makes our tracker more robust to variations in lighting, while conversion to grayscale improves performance. Next, we feed the preprocessed image to our face classifier. For each matching rectangle, we search c ertainsubregions for a left and right eye, nose, and mouth. Ultimately, the matching rectangles and subrectangles are stored in Face instances in faces . For each type of tracking, we specify a minimum object size that is proportional to the image size. Our implementation of FaceTracker should continue with the following code for update() :

def update(self, image): """Update the tracked facial features.""" self._faces = [] if utils.isGray(image): image = cv2.equalizeHist(image) else: image = cv2.cvtColor(image, cv2.equalizeHist(image, image) minSize = utils.widthHeightDividedBy(image, 8) faceRects = self._faceClassifier.detectMultiScale( image, self.scaleFactor, self.minNeighbors, self.flags, minSize) if faceRects is not None: for faceRect in faceRects: face = Face() face.faceRect = faceRect x, y, w, h = faceRect # Seek an eye in the upper-left part of the face. searchRect = (x+w/7, y, w*2/7, h/2) face.leftEyeRect = self._detectOneObject( self._eyeClassifier, image, searchRect, 64) # Seek an eye in the upper-right part of the face. searchRect = (x+w*4/7, y, w*2/7, h/2) face.rightEyeRect = self._detectOneObject( self._eyeClassifier, image, searchRect, 64) # Seek a nose in the middle part of the face. searchRect = (x+w/4, y+h/4, w/2, h/2) face.noseRect = self._detectOneObject( self._noseClassifier, image, searchRect, 32) # Seek a mouth in the lower-middle part of the face. searchRect = (x+w/6, y+h*2/3, w*2/3, h/3) face.mouthRect = self._detectOneObject( self._mouthClassifier, image, searchRect, 16) self._faces.append(face)

Note that update() relies on utils.isGray() and utils.widthHeightDividedBy(), both implemented earlier in this article. Also, it relies on a private helper method, _detectOneObject() , which is called several times in order to handle the repetitious work of tracking several subparts of the face. As arguments, _detectOneObject() requires a classifier, image, rectangle, and minimum object size. The rectangle is the image subregion that the given classifier should search. For example, the nose classifier should search the middle of the face. Limiting the search area improves performance and helps eliminate false positives. Internally, _detectOneObject() works by running the classifier on a slice of the image and returning the first match (or None if there are no matches). This approach works whether or not we are using the flag. Our implementation of FaceTracker should continue with the following code for _detectOneObject() :

def _detectOneObject(self, classifier, image, rect, imageSizeToMinSizeRatio): x, y, w, h = rect minSize = utils.widthHeightDividedBy( image, imageSizeToMinSizeRatio) subImage = image[y:y+h, x:x+w] subRects = classifier.detectMultiScale( subImage, self.scaleFactor, self.minNeighbors, self.flags, minSize) if len(subRects) == 0: return None subX, subY, subW, subH = subRects[0] return (x+subX, y+subY, subW, subH)

Lastly, FaceTracker should offer basic drawing functionality so that its tracking results can be displayed for debugging purposes. The following method implementation simply defines colors, iterates over Face instances, and draws rectangles of each Face to a given image using our rects.outlineRect() function:

def drawDebugRects(self, image): """Draw rectangles around the tracked facial features.""" if utils.isGray(image): faceColor = 255 leftEyeColor = 255 rightEyeColor = 255 noseColor = 255 mouthColor = 255 else: faceColor = (255, 255, 255) # white leftEyeColor = (0, 0, 255) # red rightEyeColor = (0, 255, 255) # yellow noseColor = (0, 255, 0) # green mouthColor = (255, 0, 0) # blue for face in self.faces: rects.outlineRect(image, face.faceRect, faceColor) rects.outlineRect(image, face.leftEyeRect, leftEyeColor) rects.outlineRect(image, face.rightEyeRect, rightEyeColor) rects.outlineRect(image, face.noseRect, noseColor) rects.outlineRect(image, face.mouthRect, mouthColor)

Now, we have a high-level tracker that hides the details of Haar cascade classifiers while allowing application code to supply new images, fetch data about tracking results, and ask for debug drawing.

Modifying the application

Let's look at two approaches to integrating face tracking and swapping into Cameo. The first approach uses a single camera feed and swaps face rectangles found within this camera feed. The second approach uses two camera feeds and copies face rectangles from one camera feed to the other.

For now, we will limit ourselves to manipulating faces as a whole and not subelements such as eyes. However, you could modify the code to swap only eyes, for example. If you try this, be careful to check that the relevant subrectangles of the face are not None.

Swapping faces in one camera feed

For the single-camera version, the modifications are quite straightforward. On initialization of Cameo, we create a FaceTracker and a Boolean variable indicating whether debug rectangles should be drawn for the FaceTracker. The Boolean is toggled in onKeypress() in response to the X key. As part of the main loop in run(), we update our FaceTracker with the current frame. Then, the resulting FaceFace objects (in the faces property) are fetched and their faceRects are swapped using rects.swapRects(). Also, depending on the Boolean value, we may draw debug rectangles that reflect the original positions of facial elements before any swap.

import cv2 import filters from managers import WindowManager, CaptureManager import rects from trackers import FaceTracker class Cameo(object): def __init__(self): self._windowManager = WindowManager('Cameo', self.onKeypress) self._captureManager = CaptureManager( cv2.VideoCapture(0), self._windowManager, True) self._faceTracker = FaceTracker() self._shouldDrawDebugRects = False self._curveFilter = filters.BGRPortraCurveFilter() def run(self): """Run the main loop.""" self._windowManager.createWindow() while self._windowManager.isWindowCreated: self._captureManager.enterFrame() frame = self._captureManager.frame self._faceTracker.update(frame) faces = self._faceTracker.faces rects.swapRects(frame, frame, [face.faceRect for face in faces]) filters.strokeEdges(frame, frame) self._curveFilter.apply(frame, frame) if self._shouldDrawDebugRects: self._faceTracker.drawDebugRects(frame) self._captureManager.exitFrame() self._windowManager.processEvents() def onKeypress(self, keycode): """Handle a keypress. space -> Take a screenshot. tab -> Start/stop recording a screencast. x -> Start/stop drawing debug rectangles around faces. escape -> Quit. """ if keycode == 32: # space self._captureManager.writeImage('screenshot.png') elif keycode == 9: # tab if not self._captureManager.isWritingVideo: self._captureManager.startWritingVideo( 'screencast.avi') else: self._captureManager.stopWritingVideo() elif keycode == 120: # x self._shouldDrawDebugRects = \ not self._shouldDrawDebugRects elif keycode == 27: # escape self._windowManager.destroyWindow() if __name__=="__main__": Cameo().run()

The following screenshot is from Cameo. Face regions are outlined after the user presses X:

The following screenshot is from Cameo. American businessman Bill Ackman performs a takeover of the author's face:

Copying faces between camera feeds

For the two-camera version, let's create a new class, CameoDouble, which is a subclass of Cameo. On initialization, a CameoDouble invokes the constructor of Cameo and also creates a second CaptureManager. During the main loop in run(), a CameoDouble gets new frames from both cameras and then gets face tracking results for both frames. Faces are copied from one frame to the other using copyRect(). Then, the destination frame is displayed, optionally with debug rectangles drawn overtop it. We can implement CameoDouble in as follows:

For some models of MacBook, OpenCV has problems using the built-in camera when an external webcam is plugged in. Specifically, the application may become deadlocked while waiting for the built-in camera to supply a frame. If you encounter this issue, use two external cameras and do not use the built-in camera.

class CameoDouble(Cameo): def __init__(self): Cameo.__init__(self) self._hiddenCaptureManager = CaptureManager( cv2.VideoCapture(1)) def run(self): """Run the main loop.""" self._windowManager.createWindow() while self._windowManager.isWindowCreated: self._captureManager.enterFrame() self._hiddenCaptureManager.enterFrame() frame = self._captureManager.frame hiddenFrame = self._hiddenCaptureManager.frame self._faceTracker.update(hiddenFrame) hiddenFaces = self._faceTracker.faces self._faceTracker.update(frame) faces = self._faceTracker.faces i = 0 while i < len(faces) and i < len(hiddenFaces): rects.copyRect( hiddenFrame, frame, hiddenFaces[i].faceRect, faces[i].faceRect) i += 1 filters.strokeEdges(frame, frame) self._curveFilter.apply(frame, frame) if self._shouldDrawDebugRects: self._faceTracker.drawDebugRects(frame) self._captureManager.exitFrame() self._hiddenCaptureManager.exitFrame() self._windowManager.processEvents()

To run a CameoDouble instead of a Cameo, we just need to modify our if __name__=="__main__" block, as follows:

if __name__=="__main__":
#Cameo().run() # uncomment for single camera
CameoDouble().run() # uncomment for double camera


We now have two versions of Cameo. One version tracks faces in a single camera feed and, when faces are found, swaps them by copying and resizing. The other version tracks faces in two camera feeds and, when faces are found in each, copies and resizes faces from one feed to replace faces in the other. Additionally, in both versions, one camera feed is made visible and effects are applied to it.

The user can displace his or her face onto another body, and the result can be stylized to give it a more unified feel. However, the transplanted faces are still just rectangular cutouts. So far, no effort is made to cut away non-face parts of the rectangle or to align superimposed and underlying components such as eyes.

Resources for Article :

Further resources on this subject:

Books to Consider

comments powered by Disqus

An Introduction to 3D Printing

Explore the future of manufacturing and design  - read our guide to 3d printing for free