Packt+ | Advance your knowledge in tech

You're reading from Kinect in Motion - Audio and Visual Tracking by Example

Product type Book

Published in Apr 2013

Publisher Packt

ISBN-13 9781849697187

Pages 112 pages

Edition 1st Edition

Languages

Concepts

Motion Graphics

Table of Contents (12) Chapters

Kinect in Motion – Audio and Visual Tracking by Example

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Preface

1. Kinect for Windows – Hardware and SDK Overview

2. Starting with Image Streams

3. Skeletal Tracking

4. Speech Recognition

Kinect Studio and Audio Recording

Index

Chapter 3. Skeletal Tracking

Skeletal tracking allows applications to recognize people and follow their actions. Skeletal tracking combined with gesture-based programming enables applications to provide a natural interface and increase the usability and ease of the application itself.

In this chapter we will learn how to enable and handle the skeleton data stream. For instance, we will address the following:

Tracking users by analyzing the skeleton data streamed by Kinect and mapping them to the color stream
Understanding what joints are and which joints are tracked in the near and seated mode
Observing the movements of the tracked users to detect simple actions

Mastering the skeleton data stream enables us to implement an application by tracking the user's actions and to recognize the user's gestures.

The Kinect sensor, thanks to the IR camera, can recognize up to six users in its field of view. Of these, only up to two users can be fully tracked, while the others are tracked from one single...

Tracking users

The application flow for tracking users is very similar to the process we described in the color frame and depth frame management:

Firstly, we need to ensure that at least one Kinect sensor is connected.
Secondly, we have to enable the stream (in this case the skeleton one).
And finally, we need to handle the frames that the sensor is streaming through the relevant SDK APIs.

In this chapter we will mention only the code that is relevant to skeletal tracking. The source code attached to the book does include all the detailed code and we can refer to the previous chapter to refresh ourselves on how to address step 1.

To enable the skeleton stream, we simply invoke the KinectSensor.SkeletonStream.Enable() method.

The Kinect sensor streams out in the skeleton stream's skeleton tracking data. This data is structured in the Skeleton class as a collection of joints. A joint is a point at which two skeleton bones are joined. This point is defined by the SkeletonPoint structure, which defines...

Default and Seated mode

As we saw in the previous chapter, the Kinect for Windows SDK provides a near-range feature in order to track people close to the sensor.

First of all, in order to activate the near tracking mode we need to enable the near-range feature by setting the sensor.DepthStream.Range property to DepthRange.Near; then by setting the sensor.SkeletonStream property to true.

This mode usually, in addition to tracking users in the range 0.4 – 0.8 m, allows for greater accuracy up to 3 m than the Default mode.

For scenarios where the user to be tracked is seated, or the lower part of his/her body is not entirely visible to the sensor, we can enable the Seated mode by setting the sensor.SkeletonStream.TrackingMode property to SkeletonTrackingMode.Seated. With this mode, the APIs track only the upper-body part's joints and will get a NotTracked status for all of the remaining joints.

The following image highlights the twenty joint points for the Default mode and joints ten joint...

Detecting simple actions

Let's see now how we can enhance our application and leverage the Kinect sensor's Natural User Interface (NUI) capabilities.

We implement a manager that, using the skeleton data, is able to interpret a body motion or a posture and translate the same to an action as "click". Similarly, we could create other actions as "zoom in". Unfortunately, the Kinect for Windows SDK does not provide APIs for recognizing gestures, so we need to develop our custom gesture recognition engine.

Gesture detection can be relatively simple or intensely complex depending on the gesture and the environment (image noise, scene with more users, and so on).

In literature there are many approaches for implementing gesture recognition, the most common ones are as follows:

A neural network that utilizes the weighted networks (Gestures and neural networks in human-computer interaction, Beale R and Alistair D N E)
A DTW that utilizes the Dynamic Time Warping algorithm initially developed for the speech...

Summary

In this chapter we learned how to track the skeletal data provided by the Kinect sensor and how to interpret them for designing relevant user actions.

With the example developed in this chapter, we definitely went to the core of designing and developing Natural User Interfaces.

Thanks to the KinectSensors.SkeletonStream.Enable() method and the event handler attached to KinectSensors.AllFramesReady, we have started to manipulate the skeleton stream data and the color stream data provided by the Kinect sensor and overlap them.

We addressed the SkeletonStream.TrackingMode property for tracking users in Default (stand-up) and Seated mode. Leveraging the Seated mode together with the ability to track user actions is very useful for application-oriented people with disabilities.

We went through the algorithmic approach for tracking user's actions and recognizing user's gestures and we developed our custom gesture manager. Gestures have been defined as a collection of movement sections for...