Reader small image

You're reading from  OpenCV with Python Blueprints

Product typeBook
Published inOct 2015
Reading LevelIntermediate
PublisherPackt
ISBN-139781785282690
Edition1st Edition
Languages
Right arrow
Authors (2):
Michael Beyeler
Michael Beyeler
author image
Michael Beyeler

Michael Beyeler is a postdoctoral fellow in neuroengineering and data science at the University of Washington, where he is working on computational models of bionic vision in order to improve the perceptual experience of blind patients implanted with a retinal prosthesis (bionic eye).His work lies at the intersection of neuroscience, computer engineering, computer vision, and machine learning. He is also an active contributor to several open source software projects, and has professional programming experience in Python, C/C++, CUDA, MATLAB, and Android. Michael received a PhD in computer science from the University of California, Irvine, and an MSc in biomedical engineering and a BSc in electrical engineering from ETH Zurich, Switzerland.
Read more about Michael Beyeler

Michael Beyeler (USD)
Michael Beyeler (USD)
author image
Michael Beyeler (USD)

Michael Beyeler is a postdoctoral fellow in neuroengineering and data science at the University of Washington, where he is working on computational models of bionic vision in order to improve the perceptual experience of blind patients implanted with a retinal prosthesis (bionic eye).His work lies at the intersection of neuroscience, computer engineering, computer vision, and machine learning. He is also an active contributor to several open source software projects, and has professional programming experience in Python, C/C++, CUDA, MATLAB, and Android. Michael received a PhD in computer science from the University of California, Irvine, and an MSc in biomedical engineering and a BSc in electrical engineering from ETH Zurich, Switzerland.
Read more about Michael Beyeler (USD)

View More author details
Right arrow

Chapter 5. Tracking Visually Salient Objects

The goal of this chapter is to track multiple visually salient objects in a video sequence at once. Instead of labeling the objects of interest in the video ourselves, we will let the algorithm decide which regions of a video frame are worth tracking.

We have previously learned how to detect simple objects of interest (such as a human hand) in tightly controlled scenarios or how to infer geometrical features of a visual scene from camera motion. In this chapter, we ask what we can learn about a visual scene by looking at the image statistics of a large number of frames. By analyzing the Fourier spectrum of natural images we will build a saliency map, which allows us to label certain statistically interesting patches of the image as (potential or) proto-objects. We will then feed the location of all the proto- objects to a mean-shift tracker that will allow us to keep track of where the objects move from one frame to the next.

To build the app,...

Planning the app


The final app will convert each RGB frame of a video sequence into a saliency map, extract all the interesting proto-objects, and feed them to a mean-shift tracking algorithm. To do this, we need the following components:

  • main: The main function routine (in chapter5.py) to start the application.

  • Saliency: A class that generates a saliency map from an RGB color image. It includes the following public methods:

    • Saliency.get_saliency_map: The main method to convert an RGB color image to a saliency map

    • Saliency.get_proto_objects_map: A method to convert a saliency map into a binary mask containing all the proto-objects

    • Saliency.plot_power_density: A method to display the 2D power density of an RGB color image, which is helpful to understand the Fourier transform

    • Saliency.plot_power_spectrum: A method to display the radially averaged power spectrum of an RGB color image, which is helpful to understand natural image statistics

  • MultiObjectTracker: A class that tracks multiple objects...

Setting up the app


In order to run our app, we will need to execute a main function routine that reads a frame of a video stream, generates a saliency map, extracts the location of the proto-objects, and tracks these locations from one frame to the next.

The main function routine

The main process flow is handled by the main function in chapter5.py, which instantiates the two classes (Saliency and MultipleObjectTracker) and opens a video file showing the number of soccer players on the field:

import cv2
import numpy as np
from os import path

from saliency import Saliency
from tracking import MultipleObjectsTracker


def main(video_file='soccer.avi', roi=((140, 100), (500, 600))):
    if path.isfile(video_file):
        video = cv2.VideoCapture(video_file)
    else:
        print 'File "' + video_file + '" does not exist.'
        raise SystemExit

    # initialize tracker
    mot = MultipleObjectsTracker()

The function will then read the video frame by frame, extract some meaningful region of...

Visual saliency


As already mentioned in the introduction, visual saliency tries to describe the visual quality of certain objects or items that allows them to grab our immediate attention. Our brains constantly drive our gaze towards the important regions of the visual scene, as if it were to shine a flashlight on different sub-regions of the visual world, allowing us to quickly scan our surroundings for interesting objects and events while neglecting the less important parts.

It is thought that this is an evolutionary strategy to deal with the constant information overflow that comes with living in a visually rich environment. For example, if you take a casual walk through a jungle, you want to be able to notice the attacking tiger in the bush to your left before admiring the intricate color pattern on the butterfly's wings in front of you. As a result, the visually salient objects have the remarkable quality of popping out of their surroundings, much like the target bars in the following...

Mean-shift tracking


It turns out that the salience detector discussed previously is already a great tracker of proto-objects by itself. One could simply apply the algorithm to every frame of a video sequence and get a good idea of the location of the objects. However, what is getting lost is correspondence information. Imagine a video sequence of a busy scene, such as from a city center or a sports stadium. Although a saliency map could highlight all the proto-objects in every frame of a recorded video, the algorithm would have no way to know which proto-objects from the previous frame are still visible in the current frame. Also, the proto-objects map might contain some false-positives, such as in the following example:

Note that the bounding boxes extracted from the proto-objects map made (at least) three mistakes in the preceding example: it missed highlighting a player (upper-left), merged two players into the same bounding box, and highlighted some additional arguably non-interesting...

Putting it all together


The result of our app can be seen in the following image:

Throughout the video sequence, the algorithm is able to pick up the location of the players, successfully tracking them frame-by-frame by using mean-shift tracking, and combining the resulting bounding boxes with the bounding boxes returned by the salience detector.

It is only through the clever combination of the saliency map and tracking that we can exclude false-positives such as line markings and artifacts of the saliency map. The magic happens in cv2.groupRectangles, which requires a similar bounding box to appear at least twice in the box_all list, otherwise it is discarded. This means that a bounding box is only then kept in the list if both mean-shift tracking and the saliency map (roughly) agree on the location and size of the bounding box.

Summary


In this chapter, we explored a way to label the potentially interesting objects in a visual scene, even if their shape and number is unknown. We explored natural image statistics using Fourier analysis, and implemented a state-of-the-art method for extracting the visually salient regions in the natural scenes. Furthermore, we combined the output of the salience detector with a tracking algorithm to track multiple objects of unknown shape and number in a video sequence of a soccer game.

It would now be possible to extend our algorithm to feature more complicated feature descriptions of proto-objects. In fact, mean-shift tracking might fail when the objects rapidly change size, as would be the case if an object of interest were to come straight at the camera. A more powerful tracker, which comes for free in OpenCV, is cv2.CamShift. CAMShift stands for Continuously Adaptive Mean-Shift, and bestows upon mean-shift the power to adaptively change the window size. Of course, it would also...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
OpenCV with Python Blueprints
Published in: Oct 2015Publisher: PacktISBN-13: 9781785282690
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Michael Beyeler

Michael Beyeler is a postdoctoral fellow in neuroengineering and data science at the University of Washington, where he is working on computational models of bionic vision in order to improve the perceptual experience of blind patients implanted with a retinal prosthesis (bionic eye).His work lies at the intersection of neuroscience, computer engineering, computer vision, and machine learning. He is also an active contributor to several open source software projects, and has professional programming experience in Python, C/C++, CUDA, MATLAB, and Android. Michael received a PhD in computer science from the University of California, Irvine, and an MSc in biomedical engineering and a BSc in electrical engineering from ETH Zurich, Switzerland.
Read more about Michael Beyeler

author image
Michael Beyeler (USD)

Michael Beyeler is a postdoctoral fellow in neuroengineering and data science at the University of Washington, where he is working on computational models of bionic vision in order to improve the perceptual experience of blind patients implanted with a retinal prosthesis (bionic eye).His work lies at the intersection of neuroscience, computer engineering, computer vision, and machine learning. He is also an active contributor to several open source software projects, and has professional programming experience in Python, C/C++, CUDA, MATLAB, and Android. Michael received a PhD in computer science from the University of California, Irvine, and an MSc in biomedical engineering and a BSc in electrical engineering from ETH Zurich, Switzerland.
Read more about Michael Beyeler (USD)