Chapter 3. Skeletal Tracking
Skeletal tracking allows applications to recognize people and follow their actions. Skeletal tracking combined with gesture-based programming enables applications to provide a natural interface and increase the usability and ease of the application itself.
In this chapter we will learn how to enable and handle the skeleton data stream. For instance, we will address the following:
Tracking users by analyzing the skeleton data streamed by Kinect and mapping them to the color stream
Understanding what joints are and which joints are tracked in the near and seated mode
Observing the movements of the tracked users to detect simple actions
Mastering the skeleton data stream enables us to implement an application by tracking the user's actions and to recognize the user's gestures.
The Kinect sensor, thanks to the IR camera, can recognize up to six users in its field of view. Of these, only up to two users can be fully tracked, while the others are tracked from one single...
The application flow for tracking users is very similar to the process we described in the color frame and depth frame management:
Firstly, we need to ensure that at least one Kinect sensor is connected.
Secondly, we have to enable the stream (in this case the skeleton one).
And finally, we need to handle the frames that the sensor is streaming through the relevant SDK APIs.
In this chapter we will mention only the code that is relevant to skeletal tracking. The source code attached to the book does include all the detailed code and we can refer to the previous chapter to refresh ourselves on how to address step 1.
To enable the skeleton stream, we simply invoke the KinectSensor.SkeletonStream.Enable()
method.
The Kinect sensor streams out in the skeleton stream's skeleton tracking data. This data is structured in the Skeleton
class as a collection of joints. A
joint is a point at which two skeleton bones are joined. This point is defined by the SkeletonPoint
structure, which defines...
As we saw in the previous chapter, the Kinect for Windows SDK provides a near-range feature in order to track people
close to the sensor.
First of all, in order to activate the near tracking mode we need to enable the near-range feature by setting the sensor.DepthStream.Range
property to DepthRange.Near
; then by setting the
sensor.SkeletonStream
property to true
.
This mode usually, in addition to tracking users in the range 0.4 – 0.8 m, allows for greater accuracy up to 3 m than the Default mode.
For scenarios where the user to be tracked is seated, or the lower part of his/her body is not entirely visible to the sensor, we can enable the Seated mode by setting the sensor.SkeletonStream.TrackingMode
property to SkeletonTrackingMode.Seated
. With this mode, the APIs track only the upper-body part's joints and will get a NotTracked
status for all of the remaining joints.
The following image highlights the twenty joint points for the Default mode and joints ten joint...
Let's see now how we can enhance our application and leverage the Kinect sensor's Natural User Interface (NUI) capabilities.
We implement a manager that, using the skeleton data, is able to interpret a body motion or a posture and translate the same to an action as "click". Similarly, we could create other actions as "zoom in". Unfortunately, the Kinect for Windows SDK does not provide APIs for recognizing gestures, so we need to develop our custom gesture recognition engine.
Gesture detection can be relatively simple or intensely complex depending on the gesture and the environment (image noise, scene with more users, and so on).
In literature there are many approaches for implementing gesture recognition, the most common ones are as follows:
A neural network that utilizes the weighted networks (Gestures and neural networks in human-computer interaction, Beale R and Alistair D N E)
A DTW that utilizes the Dynamic Time Warping algorithm initially developed for the speech...
In this chapter we learned how to track the skeletal data provided by the Kinect sensor and how to interpret them for designing relevant user actions.
With the example developed in this chapter, we definitely went to the core of designing and developing Natural User Interfaces.
Thanks to the KinectSensors.SkeletonStream.Enable()
method and the event handler attached to KinectSensors.AllFramesReady
, we have started to manipulate the skeleton stream data and the color stream data provided by the Kinect sensor and overlap them.
We addressed the SkeletonStream.TrackingMode
property for tracking users in Default (stand-up) and Seated mode. Leveraging the Seated mode together with the ability to track user actions is very useful for application-oriented people with disabilities.
We went through the algorithmic approach for tracking user's actions and recognizing user's gestures and we developed our custom gesture manager. Gestures have been defined as a collection of movement sections for...