The goal of this chapter is to develop an app that detects and tracks simple hand gestures in real time using the output of a depth sensor, such as that of a Microsoft Kinect 3D sensor or an Asus Xtion. The app will analyze each captured frame to perform the following tasks:
Gesture recognition is an ever-popular topic in computer science. This is because it not only enables humans to communicate with machines (human-machine interaction or HMI), but also constitutes the first step for machines to begin understanding human body language. With affordable sensors such as Microsoft Kinect or Asus Xtion, and open source software such as OpenKinect and OpenNI, it has never been easy to get started in the field yourself. So, what shall we do with all this technology?
The beauty of the algorithm that we are going to implement in this chapter is that it works well for a number of hand gestures, yet is simple enough to run in real time on a generic laptop. Also, if we want, we can easily extend it to incorporate more complicated hand pose estimations. The end product looks like this:
No matter how many fingers of my left hand I extend, the algorithm correctly segments the hand region (white), draws the corresponding convex hull (the green line surrounding the hand), finds all convexity defects that belong to the spaces between fingers (large green points) while ignoring others (small red points), and infers the correct number of extended fingers (the number in the bottom-right corner), even for a fist.
This chapter assumes that you have a Microsoft Kinect 3D sensor installed. Alternatively, you may install an Asus Xtion or any other depth sensor for which OpenCV has built-in support. First, install OpenKinect and libfreenect from http://www.openkinect.org/wiki/Getting_Started. Then, you need to build (or rebuild) OpenCV with OpenNI support. The GUI used in this chapter will again be designed with wxPython, which can be obtained from http://www.wxpython.org/download.php.
The final app will consist of the following modules and scripts:
gestures
: A module that consists of an algorithm for recognizing hand gestures. We separate this algorithm from the rest of the application so that it can be used as a standalone module without the need for a GUI.gestures.HandGestureRecognition
: A class that implements the entire process flow of hand-gesture recognition. It accepts a single-channel depth image (acquired from the Kinect depth sensor) and returns an annotated RGB color image with an estimated number of extended fingers.gui
: A module that provides a wxPython GUI application to access the capture device and display the video feed. This is the same module that we used in the last chapter. In order to have it access the Kinect depth sensor instead of a generic camera, we will have to extend some of the base class functionality.gui.BaseLayout
: A generic layout from which more complicated layouts can be built.chapter2
: The main script for the chapter.chapter2.KinectLayout
: A custom layout based on gui.BaseLayout
that displays the Kinect depth sensor feed. Each captured frame is processed with the HandGestureRecognition
class described earlier.chapter2.main
: The main function routine for starting the GUI application and accessing the depth sensor.