Motion Detection

(For more resources related to this topic, see here.)

Obtaining the frame difference

To begin with, we create a patch with name Frame001.pd. Put in all those elements for displaying the live webcam image in a rectangle. We use a dimen 800 600 message for the gemwin object to show the GEM window in 800 x 600 pixels. We plan to display the video image in the full size of the window.

The aspect ratio of the current GEM window is now 4:3. We use a rectangle of size 5.33 x 4 (4:3 aspect ratio) to cover the whole GEM window:

Now we have one single frame of the video image. To make a comparison with another frame, we have to store that frame in memory. In the following patch, you can click on the bang box to store a copy of the current video frame in the buffer. The latest video frame will compare against the stored copy, as shown in the following screenshot:

The object to compare two frames is pix_diff. It is similar to the Difference layer option in Photoshop. Those pixels that are the same in both frames are black. The color areas are those with changes across the two frames. Here is what you would expect in the GEM window:

To further simplify the image, we can get rid of the color and use only black and white to indicate the changes:

The pix_grey object converts a color image into grey scale. The pix_threshold object will zero out the pixels (black) with color information lower than a threshold value supplied by the horizontal slider that has value between 0 and 1. Refer to the following screenshot:

Note that a default slider has a value between 0 and 127. You have to change the range to 0 and 1 using the Properties window of the slider.

In this case, we can obtain the information about those pixels that are different from the stored image.

Detecting presence

Based on the knowledge about those pixels that have changed between the stored image and the current video image, we can detect the presence of a foreground subject in front of a static background. Point your webcam in front of a relatively static background; click on the bang box, which is next to the Store comment, to store the background image in the pix_buffer object. Anything that appears in front of the background will be shown in the GEM window. Now we can ask the question: how can we know if there is anything present in front of the background? The answer will be in the pix_blob object:

The pix_blob object calculates the centroid of an image.

The centroid ( of an image is its center of mass. Imagine that you cut out the shape of the image in a cardboard. The centroid is the center of mass of that piece of cardboard. You can balance the cardboard by using one finger to hold it as the center of mass.

In our example, the image is mostly a black-grey scale image. The pix_blob object finds out the center of the nonblack pixels and returns its position in the first and second outlets. The third outlet indicates the size of the nonblack pixel group. To detect the presence of a foreground subject in front of the background, the first and second number boxes connected to the corresponding pix_blob outlets will return roughly the center of the foreground subject. The third number box will tell how big that foreground subject is.

If you pay attention to the changes in the three number boxes, you can guess how we will implement the way to detect presence. When you click on the store image bang button, the third number box (size) will turn zero immediately. Once you enter into the frame, in front of the background, the number increases. The bigger the portion you occupy of the frame, the larger the number is. To complete the logic, we can check whether the third number box value is greater than a predefined number. If it is, we conclude that something is present in front of the background. If it is not, there is nothing in front of the background. The following patch Frame002.pd will try to display a warning message when something is present:

A comparison object > 0.002 detects the size of the grey area (blob). If it is true, it sends a value 1 to the gemhead object for the warning text to display. If it is false, it sends a value 0. We'll use a new technique to turn on/off the text. Each gemhead object can accept a toggle input to turn it on or off. A value 1 enables the rendering of that gemhead path. A value 0 disables the rendering. When you first click on the store image bang button, the third number box value drops to 0. Minor changes in the background will not trigger the text message:

If there is significant change in front of the background, the size number box will have a value larger than 0.002. It thus enables the rendering of the text2d message to display the WARNING message.

After you click on the Store bang box, you can drag the horizontal slider attached to the pix_threshold object. Drag it towards the right-hand side until the image in the GEM window turns completely black. It will roughly be the threshold value. Note also that we use a number in each gemhead object. It is the rendering order. The default one is 50. The larger number will be rendered after the lower number. In this case, the gemhead object for the pix_video object will render first. The gemhead object for the text2d object will render afterwards.

In this case, we can guarantee that the text will always be on top of the video:

Actually, you can replace the previous version with a single pix_background object. A reset message will replace the bang button to store the background image. In the following patch, it will show either the clear or warning message on the screen, depending on the presence of a subject in front of the background image:

The GEM window at this moment shows only a black screen when there isn't anything in front of the background. For most applications, it would be better to have the live video image on screen. In the following patch, we split the video signal into two – one to the pix_background object for detection and one to the pix_texture object for display:

The patch requires two pix_separator objects to separate the two video streams from pix_video, in order not to let one affect the other. Here is the background image after clicking on the reset message:

The warning message shows up after the subject entered the frame, and is triggered by the comparison object > 0.005 in the patch:

We have been using the pix_blob object to detect presence in front of a static background image. The pix_blob object will also return the position of the subject (blob) in front of the webcam. We are going to look into this in the next section.

Detecting motion

We compare the current video image with a predefined background image to detect presence. To detect motion, we compare the current video image with a previous frame. To achieve this task, we have to make use of the pix_delay object to delay a video frame:

The number box in the right-hand inlet of the pix_delay object specifies the number of frames to delay. The patch Motion001.pd delays the video image for a number of frames and displays it on the GEM window. To make comparison, we use the current video frame and the delayed frame with the same pix_diff object:

Usually, we use the value 1 in the number box for the pix_delay object to retrieve the previous frame. The guideline is to track faster motion, use a smaller number (for example, 1) to track slower motion, use a bigger number. The image in the GEM window is the frame difference between the two frames:

With this image, we can again apply the pix_blob object to obtain the tracking information:

When you move in front of the webcam, notice the changes in the three number boxes, especially the first and second for the X and Y values. The two number boxes indicate the position where motion is detected. The range of values is between 0 and 1. In the GEM library, we can use the pix_movement object to serve the same purpose for motion detection:

The right-hand inlet for pix_movement is a threshold number between 0 and 1. When color change between two frames is less than the threshold, the resulting pixel color will be black; otherwise, the resulting pixel color will be the difference between two frames. We can use a horizontal slider here. If you push the slider to the left-hand side, you can expect more imagery. When you push the slider to the right-hand side, you eliminate the imagery. You have to try out various positions to have just enough imagery for the tracking. It can depend on the lighting condition and the speed of the movement.

The pix_movement object detects movement between two frames and keeps the different image in the alpha channel of the pixel information. Note that we have to enable alpha blending by the alpha object to display the result. To obtain the details about the movement, we go back to the pix_blob object again:

Note that the pix_blob has one more inlet that we can specify. It is a horizontal radio button with five options. The first option (default) with value 0 is using grey scale for the blob tracking. The next three options (values 1, 2, 3) correspond to red, green, and blue channels for tracking. The last option (value 4) is the alpha channel. Since the pix_movement object puts the different image in the alpha channel, we have to specify the last option in the radio button to use the alpha channel for movement tracking. Pay attention to the value that changes in the first two number boxes from the outlets of the pix_blob object. Try to relate the values with your movement in front of the webcam. The values are the X and Y position of the center of movement.

In most cases, the X value is flipped because the webcam image is not a mirror image of your own. You can handle it by using a pix_flip object to flip the image horizontally:

You can verify the tracking by waving your hand from the left-hand side to the right-hand side. The first number box value, X position, will increase. If you wave your hand from top to bottom, the second number box value, Y position, will increase. The range of numbers is between 0 and 1. We now finish the first motion-tracking patch. The next challenge will be replacing the two number boxes with a graphical shape to follow the movement. That is the fun part. We work on it in the next patch Motion003.pd. Refer to the following screenshot:

The patch is simple. The X and Y position returned from pix_blob will be the X and Y position of another graphic (a circle in this example). Before we send the values to translateXYZ, we have to convert the range between 0 and 1 to the screen size in GEM's measurement unit. The screen is not a square in this example. For the current window of size 800 x 600, we have the measurement of 10.66 x 8 (which is a 4:3 ratio). Based on this piece of information, we map the 0 to 1 values into the GEM's window-measurement units using a division and a subtraction object:

The yellow circle follows the movement of the subject in front of the webcam. The position is the center of gravity of the moving blob object. If you have movements in every corner of the screen, the center is still in the middle of the window. If your movement is localized in a particular region of the window, the tracking will be more accurate. The yellow circle moves in a very jerky manner. To enhance the motion, we can use a new object, smooth. This object will smoothen the incoming value by averaging with former values. Its right-hand inlet is a horizontal slider with value between 0 and 1, to control the smoothness of the output value. Smaller value will provide smoother result.

Refer to the following screenshot:

Both the X and Y position values use the smooth object. It smoothens the incoming numbers, and output it for the yellow circle to follow. The next step will resume the normal video display by splitting the video signal into two – one for the pix_movement object to track the motion and another one for the pix_texture object to display:

In this version, we use a piece of text to follow the movement, instead of a circle. The rest is the same as the former one for detecting presence:

Creating a motion detection animation

So far in this article, we learned the basic tools to detect presence and motion. Detecting presence is like a binary switch that indicates the presence or absence of a subject in front of a predefined background. Detecting motion indicates the position of the center of the moving blob object. We have tried using a graphical shape and a piece of text to follow the movement. In the following patch Motion004.pd, we use an image with the pix_image object:

There is no new technique here. Use the pix_image object with an open Flower001.png message to open the image file in the same folder with the patch. You also need the pix_texture object to map the image onto the square. In this case, the image will follow your movement:

The image is itself a square. It has a white background color. In some applications, you may want to have an irregular shape with a transparent background. We can also directly make use of the alpha channel of the PNG or TIFF file to remove the background. In Photoshop, you can delete the background of an image and save the transparency information in the alpha channel. GEM library supports such image files with alpha channel. Remember to enable alpha blending with the alpha object. In the following example, the image Flower002.png that pix_image opens has a transparent background:

Note that the flower does not have the white background color here:

The next thing we can do is to analyze the direction of movement. The pix_blob object returns the X and Y positions of the moving blob object. To find out the direction, we have to know the X and Y positions of two consecutive frames and compute their difference. That means we have to store a pair of X and Y positions for the previous frame. We use the float object again for storage purpose:

The patch computes the difference of a value that changes per frame. The float object keeps the previous value. The first number box is the current value. When it changes, it first sends a bang message to the float object to output its stored value, that is, the previous value. It then sends the current value to the subtraction object to compute the difference. Finally, it sends another copy of the current value to the right-hand inlet of the float object to be used in the next frame. The next patch Difference001.pd will use this logic to compute the changes in X position:

You will observe that the value changes in the number box after the subtraction operator. When you move toward the left-hand side, the number becomes negative. When you move toward the right-hand side, it will be positive. We are going to make use of this relation to create an interactive animation in the next patch Direction002.pd:

On the right-hand side of the patch, where we compute the difference in positions between two frames, we use the smooth object again to smoothen the value. We also need to increase this number as it is originally within the range of 0 to 1. In this case, we multiply it by 360, corresponding to the 360 degrees in a complete rotation.

On the left-hand side of the patch is a cube. We use the draw line message to enable the wireframe view. It is also necessary to disable the depth test by using the depth object. The horizontal movement will drive the cube to rotate along its y axis. The float object above the rotateXYZ object stores the current rotation value. The number sent from the r rot object is the amount we add to the current rotation value. It can be negative or positive, depending on which direction you wave your hand. We use a trigger b b f object because it is necessary to send the number, add it to the current value, and route the result to the rotateXYZ object in one step:

Comparing colors

The next technique to detect motion is by comparing colors across different frames. Firstly, we identify a pixel in the video frame. We store the pixel color information in the Pure Data patch. In the subsequent frame, we compare the color of that pixel with the stored information. If the colors change significantly, we assume there is movement in that area. To work with these tasks, we have to know the pixel color information. The pix_data object does it.

We need to provide four inlets for the pix_data object. The first one is a bang message to trigger the reading of the pixel color. The second is the video image. The last two are the X and Y positions of the pixel in the range between 0 and 1. We use two horizontal sliders for the X and Y positions. The position (0, 0) is the top-left corner. The position (1, 1) is the bottom-right corner. The output is the second outlet that is a list of the red, green, and blue colors, also in the range between 0 and 1. It needs an unpack object to split the list into three numbers.

The next challenge is to compare two colors. We cannot check if the two colors match exactly because there will be noise in the video signal. We can only check if the two colors look similar. The similarity is a numeric threshold. In this case, we have to find a way to measure the distance between two colors. Whenever the distance is shorter than a predefined threshold, we claim that the two colors are similar.

Each color is a combination of three primary colors: red, green, and blue. We think that each color is actually one point in a three-dimensional space with the three axes: red, green, and blue. The range of each axis is from 0 to 1. To compute the distance between two points in a three-dimensional space of X, Y, and Z, we use the Pythagorean theorem:

When we learned mathematics in school; for a right angle triangle with three edges as shown earlier, we understood that the length, a, b, and c have the relation:

c2 = a2 + b2

For the two points (x1, y1) and (x2, y2) in a 2D plane, we can also make use of this relation to measure the distance between them:

a = x2 – x1

b = y2 – y1

The value of c will be the distance between point (x1, y1) and point (x2, y2). We can have this formula:

c2 = (x2 – x1)2 + (y2 – y1)2

c = √((x2 – x1)2 + (y2 – y1)2)

We can generalize the case in 3D space. Assume we have two points (x1, y1, z1) and (x2, y2, z2) in space and the distance between them will be √((x2-x1)2 + (y2-y1)2 + (z2-z1)2). If we replace the X, Y and Z with R, G, and B, the two colors are (r1, g1, b1) and (r2, g2, b2). The distance between them will be √((r2-r1)2 + (g2-g1)2 + (b2-b1)2). Now we put the formula into an abstraction patch colorDistance.pd:

The expr object implements the formula of the square root of the sum of the differences between the red, green, and blue components of the two colors. To validate the patch, we can use a very simple patch to check the result, which is shown in the following screenshot:

Each slider has the range between 0 and 1. Each pack object will compile a list of the red, green, and blue and send to the colorDistance abstraction. At this moment, only the red component is the hot inlet. It does not matter as we just use it for validating the colorDistance result.

Now we know how to obtain the color pixel information. We understand how to compare two colors. The remaining task is to find a way to store a copy of the pixel color and compare it with the latest pixel color from the video frame. We use the spigot object. The switching function of the spigot object can allow us to store the color information or pass it for comparison. Let's have a look at the patch Color003.pd:

In the patch, we put six number boxes to indicate the pixel color information. The three on the left-hand side are the current pixel color from the video image. The three on the right-hand side are the pixel colors stored for comparison. To work with the patch, click on the dimension message, create the GEM window, start the rendering, and flip the video image horizontally. Push the sliders roughly towards the center of the X and Y range. It will be a position in the center of the window. Turn on the toggle for the metro object. The three number boxes on the left-hand side will change continuously. They are the current red, green, and blue values of the pixel selected by the two sliders. To store the current pixel color for comparison, turn on the toggle labeled Store color. The three number boxes on the right-hand side start changing. They are now the same as the three on the left-hand side. Uncheck the toggle labeled Store color. The three numbers on the right-hand side will stay there. And the number box after the colorDistance abstraction will indicate the distance between this stored color and the current pixel color in the latest video frame. We can test this number with a comparison operator to see if there is significant change in that particular area of the video image.


This completes a quick introduction to motion detection using Pure Data and the GEM library. We learned how to detect the presence of a subject in front of a static background. By comparing two consecutive frames, we identified movement in front of the webcam. Using the position given by the GEM library, we used graphical shape to follow the movement of a subject. By comparing the color information of a specific pixel, we could detect precise movement in a specific pixel in the video image. With this information, we could implement interactive hotspots that exist in virtual space.

Resources for Article :

Further resources on this subject:

You've been reading an excerpt of:

Multimedia Programming with Pure Data

Explore Title