Reader small image

You're reading from  OpenCV By Example

Product typeBook
Published inJan 2016
Reading LevelIntermediate
PublisherPackt
ISBN-139781785280948
Edition1st Edition
Languages
Tools
Right arrow
Authors (3):
Prateek Joshi
Prateek Joshi
author image
Prateek Joshi

Prateek Joshi is the founder of Plutoshift and a published author of 9 books on Artificial Intelligence. He has been featured on Forbes 30 Under 30, NBC, Bloomberg, CNBC, TechCrunch, and The Business Journals. He has been an invited speaker at conferences such as TEDx, Global Big Data Conference, Machine Learning Developers Conference, and Silicon Valley Deep Learning. Apart from Artificial Intelligence, some of the topics that excite him are number theory, cryptography, and quantum computing. His greater goal is to make Artificial Intelligence accessible to everyone so that it can impact billions of people around the world.
Read more about Prateek Joshi

David Millán Escrivá
David Millán Escrivá
author image
David Millán Escrivá

David Millán Escrivá was 8 years old when he wrote his first program on an 8086 PC in Basic, which enabled the 2D plotting of basic equations. In 2005, he finished his studies in IT with honors, through the Universitat Politécnica de Valencia, in human-computer interaction supported by computer vision with OpenCV (v0.96). He has worked with Blender, an open source, 3D software project, and on its first commercial movie, Plumiferos, as a computer graphics software developer. David has more than 10 years' experience in IT, with experience in computer vision, computer graphics, pattern recognition, and machine learning, working on different projects, and at different start-ups, and companies. He currently works as a researcher in computer vision.
Read more about David Millán Escrivá

Vinícius G. Mendonça
Vinícius G. Mendonça
author image
Vinícius G. Mendonça

Vinícius G. Mendonça is a professor at PUCPR and a mentor at Apple Developer Academy. He has a master's degree in Computer Vision and Image Processing (PUCPR) and a specialization degree in Game Development (Universidade Positivo). He is also one of the authors of the book Learn OpenCV 4 by Building Projects, also by Packt Publishing. He has been in this field since 1996. His former experience includes designing and programming a multithreaded framework for PBX tests at Siemens, coordination of Aurélio Dictionary software (including its apps for Android, IOS, and Windows phones), and coordination of an augmented reality educational activity for Positivo's Mesa Alfabeto, presented at CEBIT. Currently, he works with server-side Node.js at a company called Tenet Tech.
Read more about Vinícius G. Mendonça

View More author details
Right arrow

Chapter 8. Video Surveillance, Background Modeling, and Morphological Operations

In this chapter, we will learn how to detect a moving object in a video that is taken from a static camera. This is used extensively in video surveillance systems. We will discuss the different characteristics that can be used to build this system. We will learn about background modeling and see how we can use it to build a model of the background in a live video. Once we do this, we will combine all the blocks to detect the objects of interest in the video.

By the end of this chapter, you should be able to answer the following questions:

  • What is naive background subtraction?

  • What is frame differencing?

  • How to build a background model?

  • How to identify a new object in a static video?

  • What is morphological image processing and how is it related to background modeling?

  • How to achieve different effects using morphological operators?

Understanding background subtraction


Background subtraction is very useful in video surveillance. Basically, the background subtraction technique performs really well in cases where we need to detect moving objects in a static scene. Now, how is this useful for video surveillance? The process of video surveillance involves dealing with a constant data flow. The data stream keeps coming in at all times, and we need to analyze it to identify any suspicious activities. Let's consider the example of a hotel lobby. All the walls and furniture have a fixed location. Now, if we build a background model, we can use it to identify suspicious activities in the lobby. We can take advantage of the fact that the background scene remains static (which happens to be true in this case). This helps us avoid any unnecessary computation overheads.

As the name suggests, this algorithm works by detecting the background and assigning each pixel of an image to two classes: either the background (assuming that it...

Naive background subtraction


Let's start the background subtraction discussion from the beginning. What does a background subtraction process look like? Consider the following image:

The preceding image represents the background scene. Now, let's introduce a new object into this scene:

As shown in the preceding image, there is a new object in the scene. So, if we compute the difference between this image and our background model, you should be able to identify the location of the TV remote:

The overall process looks like this:

Does it work well?

There's a reason why we call it the naive approach. It works under ideal conditions, and as we know, nothing is ideal in the real world. It does a reasonably good job of computing the shape of the given object, but it does so under some constraints. One of the main requirements of this approach is that the color and intensity of the object should be sufficiently different from that of the background. Some of the factors that affect these kinds of algorithms...

Frame differencing


We know that we cannot keep a static background image that can be used to detect objects. So, one of the ways to fix this would be to use frame differencing. It is one of the simplest techniques that we can use to see what parts of the video are moving. When we consider a live video stream, the difference between successive frames gives a lot of information. The concept is fairly straightforward. We just take the difference between successive frames and display the difference.

If I move my laptop rapidly, we can see something like this:

Instead of the laptop, let's move the object and see what happens. If I rapidly shake my head, it will look something like this:

As you can see in the preceding images, only the moving parts of the video get highlighted. This gives us a good starting point to see the areas that are moving in the video. Let's take a look at the function to compute the frame difference:

Mat frameDiff(Mat prevFrame, Mat curFrame, Mat nextFrame)
{
    Mat diffFrames1...

The Mixture of Gaussians approach


Before we talk about Mixture of Gaussians (MOG), let's see what a mixture model is. A mixture model is just a statistical model that can be used to represent the presence of subpopulations within our data. We don't really care about what category each data point belongs to. All we need to do is identify whether the data has multiple groups inside it. Now, if we represent each subpopulation using the Gaussian function, then it's called Mixture of Gaussians. Let's consider the following image:

Now, as we gather more frames in this scene, every part of the image will gradually become part of the background model. This is what we discussed earlier as well. If a scene is static, the model adapts itself to make sure that the background model is updated. The foreground mask, which is supposed to represent the foreground object, looks like a black image at this point because every pixel is part of the background model.

Note

OpenCV has multiple algorithms implemented...

Morphological image processing


As discussed earlier, background subtraction methods are affected by many factors. Their accuracy depends on how we capture the data and how it's processed. One of the biggest factors that tend to affect these algorithms is the noise level. When we say noise, we are talking about things, such as graininess in an image, isolated black/white pixels, and so on. These issues tend to affect the quality of our algorithms. This is where morphological image processing comes into picture. Morphological image processing is used extensively in a lot of real-time systems to ensure the quality of the output.

Morphological image processing refers to processing the shapes of features in the image. For example, you can make a shape thicker or thinner. Morphological operators rely on how the pixels are ordered in an image, but on their values. This is the reason why they are really well suited to manipulate shapes in binary images. Morphological image processing can be applied...

Slimming the shapes


We can achieve this effect using an operation called erosion. This is an operation that makes a shape thinner by peeling the boundary layers of all the shapes in the image:

Let's take a look at the function that performs morphological erosion:

Mat performErosion(Mat inputImage, int erosionElement, int erosionSize)
{
    Mat outputImage;
    int erosionType;
    
    if(erosionElement == 0)
        erosionType = MORPH_RECT;
    
    else if(erosionElement == 1)
        erosionType = MORPH_CROSS;
    
    else if(erosionElement == 2)
        erosionType = MORPH_ELLIPSE;
    
    // Create the structuring element for erosion
    Mat element = getStructuringElement(erosionType, Size(2*erosionSize + 1, 2*erosionSize + 1), Point(erosionSize, erosionSize));
    
    // Erode the image using the structuring element
    erode(inputImage, outputImage, element);
    
    // Return the output image
    return outputImage;
}

You can check out the complete code in the .cpp files to understand...

Thickening the shapes


We use an operation called dilation to achieve thickening. This is an operation that makes a shape thicker by adding boundary layers to all the shapes in the image:

Here is the code to do this:

Mat performDilation(Mat inputImage, int dilationElement, int dilationSize)
{
    Mat outputImage;
    int dilationType;
    
    if(dilationElement == 0)
        dilationType = MORPH_RECT;
    
    else if(dilationElement == 1)
        dilationType = MORPH_CROSS;
    
    else if(dilationElement == 2)
        dilationType = MORPH_ELLIPSE;
    
    // Create the structuring element for dilation
    Mat element = getStructuringElement(dilationType, Size(2*dilationSize + 1, 2*dilationSize + 1), Point(dilationSize, dilationSize));
    
    // Dilate the image using the structuring element
    dilate(inputImage, outputImage, element);
    
    // Return the output image
    return outputImage;
}

Other morphological operators


Here are some other morphological operators that are interesting. Let's first take a look at the output image. We can take a look at the code at the end of this section.

Morphological opening

This is an operation that opens a shape. This operator is frequently used for noise removal in an image. We can achieve morphological opening by applying erosion followed by dilation to an image. The morphological opening process basically removes small objects from the foreground in the image by placing them in the background:

Here is the function to the perform morphological opening:

Mat performOpening(Mat inputImage, int morphologyElement, int morphologySize)
{
    Mat outputImage, tempImage;
    int morphologyType;
    
    if(morphologyElement == 0)
        morphologyType = MORPH_RECT;
    
    else if(morphologyElement == 1)
        morphologyType = MORPH_CROSS;
    
    else if(morphologyElement == 2)
        morphologyType = MORPH_ELLIPSE;
    
    // Create the structuring...

Summary


In this chapter, we learned about the algorithms that are used for background modeling and morphological image processing. We discussed naïve background subtraction and its limitations. We learned how to get motion information using frame differencing and how it can be constrain us when we want to track different types of objects. We also discussed Mixture of Gaussians, along with its formulation and implementation details. We then discussed morphological image processing. We learned how it can be used for various purposes and different operations were demonstrated to show the use cases.

In the next chapter, we will discuss how to track an object and the various techniques that can be used to do it.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
OpenCV By Example
Published in: Jan 2016Publisher: PacktISBN-13: 9781785280948
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Prateek Joshi

Prateek Joshi is the founder of Plutoshift and a published author of 9 books on Artificial Intelligence. He has been featured on Forbes 30 Under 30, NBC, Bloomberg, CNBC, TechCrunch, and The Business Journals. He has been an invited speaker at conferences such as TEDx, Global Big Data Conference, Machine Learning Developers Conference, and Silicon Valley Deep Learning. Apart from Artificial Intelligence, some of the topics that excite him are number theory, cryptography, and quantum computing. His greater goal is to make Artificial Intelligence accessible to everyone so that it can impact billions of people around the world.
Read more about Prateek Joshi

author image
David Millán Escrivá

David Millán Escrivá was 8 years old when he wrote his first program on an 8086 PC in Basic, which enabled the 2D plotting of basic equations. In 2005, he finished his studies in IT with honors, through the Universitat Politécnica de Valencia, in human-computer interaction supported by computer vision with OpenCV (v0.96). He has worked with Blender, an open source, 3D software project, and on its first commercial movie, Plumiferos, as a computer graphics software developer. David has more than 10 years' experience in IT, with experience in computer vision, computer graphics, pattern recognition, and machine learning, working on different projects, and at different start-ups, and companies. He currently works as a researcher in computer vision.
Read more about David Millán Escrivá

author image
Vinícius G. Mendonça

Vinícius G. Mendonça is a professor at PUCPR and a mentor at Apple Developer Academy. He has a master's degree in Computer Vision and Image Processing (PUCPR) and a specialization degree in Game Development (Universidade Positivo). He is also one of the authors of the book Learn OpenCV 4 by Building Projects, also by Packt Publishing. He has been in this field since 1996. His former experience includes designing and programming a multithreaded framework for PBX tests at Siemens, coordination of Aurélio Dictionary software (including its apps for Android, IOS, and Windows phones), and coordination of an augmented reality educational activity for Positivo's Mesa Alfabeto, presented at CEBIT. Currently, he works with server-side Node.js at a company called Tenet Tech.
Read more about Vinícius G. Mendonça