Skeletal tracking allows applications to recognize people and follow their actions. Skeletal tracking combined with gesture-based programming enables applications to provide a natural interface and increase the usability and ease of the application itself.
In this chapter we will learn how to enable and handle the skeleton data stream. For instance, we will address the following:
Mastering the skeleton data stream enables us to implement an application by tracking the user's actions and to recognize the user's gestures.
The Kinect sensor, thanks to the IR camera, can recognize up to six users in its field of view. Of these, only up to two users can be fully tracked, while the others are tracked from one single point only, as demonstrated in the following image:
Tracking up to six users in the field of view
The application flow for tracking users is very similar to the process we described in the color frame and depth frame management:
In this chapter we will mention only the code that is relevant to skeletal tracking. The source code attached to the book does include all the detailed code and we can refer to the previous chapter to refresh ourselves on how to address step 1.
To enable the skeleton stream, we simply invoke the KinectSensor.SkeletonStream.Enable()
method.
The Kinect sensor streams out in the skeleton stream's skeleton tracking data. This data is structured in the Skeleton
class as a collection of joints. A
joint is a point at which two skeleton bones are joined. This point is defined by the SkeletonPoint
structure, which defines a 3D position—or point defined in meters by the three values (x,y,z)—in the skeleton space. We have up to twenty joints per single skeleton. A detailed list of the joint types is defined by the
JointType
enumeration at http://msdn.microsoft.com/en-us/library/microsoft.kinect.jointtype.aspx.
We are going to store the skeleton data in the private Skeleton[] skeletonData
array that we size as per the sensor.SkeletonStream.FrameSkeletonArrayLength
property. This property provides the total length of the skeleton data buffer for the SkeletonFrame
class and enables skeleton tracking to fully track active skeletons and/or track the location of active skeletons.
We enable our application to listen to and manage the skeleton stream defining the void sensor_AllFramesReady(object sender, AllFramesReadyEventArgs e)
event handler attached to the this.sensor.AllFramesReady
event.
The following code snippet summarizes the necessary steps to enable the skeleton stream:
//handle the status changed event for the current sensor. //All the available status value are defined in the Microsoft.Kinect.KinectStatus enum void KinectSensors_StatusChanged(object sender, StatusChangedEventArgs e) { //select the first (if any available) connected Kinect Sensor from the KinectSensor.KinectSensors collection this.sensor = KinectSensor.KinectSensors.FirstOrDefault(s => s.Status == KinectStatus.Connected); if (null != this.sensor) {//enable the skeleton stream sensor.SkeletonStream.Enable(); // Allocate Skeleton data skeletonData = new Skeleton[sensor.SkeletonStream.FrameSkeletonArrayLength]; // subscribe to the event raised when all frames are ready this.sensor.AllFramesReady += sensor_AllFramesReady; // Start the sensor try { this.sensor.Start();} catch (IOException) { this.sensor = null; } } }
As we have noticed, we subscribed to the
AllFramesReady
event, which is raised when all the frames (color, depth, and skeleton) are ready. We could rather subscribe to the
SkeletonFrameReady
event, which is raised when only the skeleton frame is ready. As we will see soon, we opted for the AllFrameReady
event because in our example, we need to handle both the skeleton and the color frames.
In this example we will manage the skeleton stream reacting to the frame ready event. We could apply the same consideration debated for the color frame and approach skeleton tracking using the polling technique. To do so, we should leverage the
SkeletonStream.OpenNextFrame()
method instead of subscribing to the AllFramesReady
event or to the
SkeletonFrameReady
event.
At this stage the code written in the
sensor_AllFramesReady
event handler should:
The following code snippet embeds all the activities aforementioned:
/// <summary> /// manage the entire stream data received from the sensor /// </summary> /// <param name=”sender”></param> /// <param name=”e”></param> void sensor_AllFramesReady(object sender, AllFramesReadyEventArgs e) { using (ColorImageFrame colorFrame = e.OpenColorImageFrame()) { if (colorFrame != null) { //copy the color frame's pixel data to the array colorFrame.CopyPixelDataTo(this.colorPixels); //draw the WritableBitmap this.colorBitmap.WritePixels( new Int32Rect(0, 0, this.colorBitmap.PixelWidth, this.colorBitmap.PixelHeight), this.colorPixels, this.colorBitmap.PixelWidth * colorFrame.BytesPerPixel, 0); } } //handle the Skeleton stream data using (SkeletonFrame skeletonFrame = e.OpenSkeletonFrame()) // Open the Skeleton frame { if (skeletonFrame != null && this.skeletonData != null) // check that a frame is available { skeletonFrame.CopySkeletonDataTo(this.skeletonData); // get the skeletal information in this frame } } //draw the output using (DrawingContext dc = this.drawingGroup.Open()) { // draw the color stream output dc.DrawImage(this.colorBitmap, new Rect(0.0, 0.0, RenderWidth, RenderHeight)); //draw the skeleton stream data DrawSkeletons(dc); // define the limited area for rendering the visual outcome this.drawingGroup.ClipGeometry = new RectangleGeometry(new Rect(0.0, 0.0, RenderWidth, RenderHeight)); }}
For all the explanations related to the color stream data and frame, we can refer to the previous chapter. Let's now focus on the skeleton data stream and how we visualize them overlapping the color frame.
Thanks to the SkeletonFrame.CopySkeletonDataTo
method, we can copy the skeleton data to our skeletonData
array, where we store each skeleton as collection of the joints.
We can draw the skeleton data overlapping the color frame on the screen thanks to an instance of the
System.Windows.Media.DrawingContext
class.We obtain this instance calling the Open()
method of the System.Windows.Media.DrawingGroup
class.
There are certainly other ways we could obtain the graphical result. Having said that, the DrawingGroup
class provides a handy solution to our problem where we need to handle a collection of bones and joints that can be activated upon as a single image.
RenderWidth
and RenderHeight
are two double constants set to 640.0f
and 480.0f
. We use them to handle the width and height dimensions of the image we display.
The following code snippet initializes the
DrawingImage imageSource
and DrawingGroup drawingGroup
variables we use for displaying the graphical outcome of this chapter's example:
this.drawingGroup = new DrawingGroup(); // Create an image source that we can use in our image control this.imageSource = new DrawingImage(this.drawingGroup); // Display the drawing using our image control imgMain.Source = this.imageSource;
For drawing the skeletons, we loop through the entire skeleton data and we render it skeleton by skeleton. For the skeletons that get fully tracked, we draw a complete skeleton composed by bones and joints. For the skeletons that are not able to be fully tracked, we draw a single ellipse only to highlight their position. We highlight when a user moves to the edge of the field of view. This provides a visual feedback indicating the user skeleton has been clipped:
/// <summary> /// Draw the skeletons defined in the skeleton data /// </summary> /// <param name=”drawingContext”>dc used to design lines and ellipses representing bones and joints</param> private void DrawSkeletons(DrawingContext drawingContext) { foreach (Skeleton skeleton in this.skeletonData) { if (skeleton != null) { // Fully Tracked skeleton if (skeleton.TrackingState == SkeletonTrackingState.Tracked) { DrawTrackedSkeletonJoints(skeleton.Joints, drawingContext); } // Recognized position of the skeleton else if (skeleton.TrackingState == SkeletonTrackingState.PositionOnly) { DrawSkeletonPosition(skeleton.Position, drawingContext); } //handle clipped edges RenderClippedEdges(skeleton, drawingContext); } } }
We render the fully tracked skeletons using lines to represent bones and ellipses to represent joints. A section of the body is defined as a set of bones and their related joints. The following code snippet highlights the mechanism used to render the head and shoulders. We could apply the same mechanism to render the left arm, the right arm, the body, the left leg, and the right leg:
/// <summary> /// Draw the skeleton joints successfully fully tracked /// </summary> /// <param name=”jointCollection”>joint collection to draw</param> /// <param name=”drawingContext”>design the graphical output</param> private void DrawTrackedSkeletonJoints(JointCollection jointCollection, DrawingContext drawingContext) { // Render Head and Shoulders DrawBone(jointCollection[JointType.Head], jointCollection[JointType.ShoulderCenter], drawingContext); DrawBone(jointCollection[JointType.ShoulderCenter], jointCollection[JointType.ShoulderLeft], drawingContext); DrawBone(jointCollection[JointType.ShoulderCenter], jointCollection[JointType.ShoulderRight], drawingContext); // Render other bones... //Render all the joints foreach (Joint singleJoint in jointCollection) { DrawJoin(singleJoint, drawingContext); } }
We render a skeleton identified with its position only using a single azure-colored ellipse, as defined in the following code snippet:
/// <summary>
/// Draw the skeleton position only
/// </summary>
/// <param name=”skeletonPoint”>skeleton single point</param>
/// <param name=”drawingContext”>dc used to design the graphical output</param>
private void DrawSkeletonPosition(SkeletonPoint skeletonPoint, DrawingContext drawingContext)
{
drawingContext.DrawEllipse(Brushes.Azure, null, this.SkeletonPointToScreen(skeletonPoint), 2, 2); }
The following code demonstrates how we can provide a visual feedback when the user moves to the edge of the field of view. Thanks to the Skeleton.ClippedEdges.HasFlag
method, the skeletal tracking system provides a feedback whenever the user skeleton has been clipped on a given edge:
/// <summary>
/// Highlights the edge where the skeleton data have been clipped
/// </summary>
/// <param name=”skeleton”>single skeleton</param>
/// <param name=”drawingContext”>dc used to design the graphical output</param>
private void RenderClippedEdges(Skeleton skeleton, DrawingContext drawingContext)
{ //tests wherever the user skeleton has been clipped or not
if (skeleton.ClippedEdges.HasFlag(FrameEdges.Bottom))
{ // colors the bottom border when the user is reaching it
drawingContext.DrawRectangle(
Brushes.Red,
null,
new Rect(0, RenderHeight - 10, RenderWidth, 10));
}
//manage the other edges
}
As stated previously, we intend a bone to be a line connecting two adjacent joints. The single joint can assume a TrackingState
value defined by the JointTrackingState
enum: NotTracked
, Inferred
, and Tracked
. We define a bone as tracked if and only if both the joints have TrackingState
equal to JointTrackingState.Tracked
. We define a bone as non-tracked if at least one of its joints has TrackingState
equal to JointTrackingState.Inferred
. We are not able to render the bone if any of its joints has TrackingState
equal to JointTrackingState.NotTracked
:
/// <summary> /// draw a bone as line between two given joints /// </summary> /// <param name=”jointFrom”>starting joint of the bone</param> /// <param name=”jointTo”>ending joint of the bone</param> /// <param name=”drawingContext”>dc used to design the graphical output</param> private void DrawBone(Joint jointFrom, Joint jointTo, DrawingContext drawingContext) { if (jointFrom.TrackingState == JointTrackingState.NotTracked || jointTo.TrackingState == JointTrackingState.NotTracked) { return; // nothing to draw, one of the joints is not tracked } if (jointFrom.TrackingState == JointTrackingState.Inferred || jointTo.TrackingState == JointTrackingState.Inferred) { // Draw thin lines if either one of the joints is inferred DrawNonTrackedBoneLine(jointFrom.Position, jointTo.Position, drawingContext); } if (jointFrom.TrackingState == JointTrackingState.Tracked && jointTo.TrackingState == JointTrackingState.Tracked) { // Draw bold lines if the joints are both tracked DrawTrackedBoneLine(jointFrom.Position, jointTo.Position, drawingContext); }}
We draw the bone simply by calling the DrawingContext.DrawLine
method. We can use two different colors for differentiating between tracked bones and non-tracked bones. For example, we can define Pen trackedBonePen = new Pen(Brushes.Gold, 6)
for tracked bones. The following method defines the way we render tracked bones:
/// <summary>
/// draw a line representing a tracked bone
/// </summary>
/// <param name=”skeletonPointFrom”>starting point of the bone</param>
/// <param name=”skeletonPointTo”>ending point of the bone</param>
/// <param name=”drawingContext”>dc used to design the graphical output</param>
private void DrawTrackedBoneLine(SkeletonPoint skeletonPointFrom, SkeletonPoint skeletonPointTo, DrawingContext drawingContext)
{
drawingContext.DrawLine(this.trackedBonePen, this.SkeletonPointToScreen(skeletonPointFrom), this.SkeletonPointToScreen(skeletonPointTo));
}
Similarly, we can draw the joints as ellipses and differentiate those with TrackingState
equal to JointTrackingState.Tracked
from those with TrackingState
equal to JointTrackingState.Inferred
or JointTrackingState.NotTracked
. The following code snippet indicates how we can render the joint and adjust it according to the joints' TrackingState
:
if (singleJoint.TrackingState == JointTrackingState.NotTracked) { return; // nothing to draw } // singleJoint is the joint to draw if (singleJoint.TrackingState == JointTrackingState.Inferred) { DrawNonTrackedJoint(singleJoint, drawingContext); // Draw thin ellipse if the joint is inferred } // drawingContext is the dc used to design the graphical if (singleJoint.TrackingState == JointTrackingState.Tracked) { DrawTrackedJoint(singleJoint, drawingContext); // Draw bold ellipse if the joint is tracked } private void DrawTrackedJoint(Joint singleJoint, DrawingContext drawingContext) { drawingContext.DrawEllipse( this.trackedJointBrush, null, this.SkeletonPointToScreen(singleJoint.Position), 10, 10); }
To visualize the single skeletons overlapping the color image in the right position, we utilize the CoordinateMapper.MapSkeletonPointToColorPoint
method, which maps a point from skeleton space to color space:
/// <summary> /// Maps a SkeletonPoint to lie within our render space and converts to Point /// </summary> /// <param name=”skelpoint”>point to map</param> /// <returns>mapped point</returns> private Point SkeletonPointToScreen(SkeletonPoint skelpoint) { // Convert point to color space. // We are assuming our output resolution to be 640x480. ColorImagePoint colorPoint = this.sensor.CoordinateMapper.MapSkeletonPointToColorPoint(skelpoint, ColorImageFormat.RgbResolution640x480Fps30); return new Point(colorPoint.X, colorPoint.Y); }
We are now ready: our skeletons overlap the color data stream and we can take a funny x-ray of ourselves. The full list of joints is detailed in the JointType
enumeration available online at http://msdn.microsoft.com/en-us/library/microsoft.kinect.jointtype.aspx. The joint state is detailed in the JointTrackingState
enumeration available at http://msdn.microsoft.com/en-us/library/microsoft.kinect.jointtrackingstate.aspx.
The Kinect sensor in its skeletal tracking mode by default selects the first two recognized users in the field of view. We can use the AppChoosesSkeletons
and ChooseSkeletons
members of the
SkeletonStream
class to actively choose in the application which skeleton to track among the six users recognized in the field of view.
We may decide to track the closest skeleton or the skeleton that falls in a predefined distance interval. The source code attached to this chapter defines a simple routine for tracking the closest skeleton.
The remaining four skeletons are tracked highlighting the HipCenter
(center, between hips) joint only.