C H A P T E R  2

Kinect Basics

by Enrique Ramos

The Kinect, shown in Figure 2-1, was launched on November 4, 2010 and sold an impressive 8 million units in the first 60 days, entering the Guinness World Records as the “fastest selling consumer electronics device in history.” The Kinect was the first commercial sensor device to allow the user to interact with a console through a natural user interface (using gestures and spoken commands instead of a game controller). The Kinect is the second pillar of this book, so we will spend this chapter getting you acquainted with it.

In the following pages, you will learn the story of this groundbreaking device, its hardware, software, and data streams. You will learn what a structured-light 3D scanner is and how it is implemented in the Kinect sensor. You will learn about the different data you can acquire from your Kinect (such as RGB, infrared, and depth images), and how they are combined to perform motion tracking and gesture recognition. Welcome to the world of Kinect!

images

Figure 2-1. The Kinect

BUYING A KINECT

A Brief History of the Kinect

Microsoft announced the Kinect project on June 1, 2009 under the code name Project Natal. The name was changed to Kinect on June 13, 2010; it is derived from the words “kinetic” and “connect” to express the ideas behind the device. The motto of the marketing campaign for the launch was, very appropriately, “You are the controller.

This first launch caused a worldwide frenzy, and hackers soon found the one “open door” in this amazing device. Unlike other game controllers, the Kinect connection was an open USB port, so it could potentially be connected to a PC. Unfortunately, Microsoft had not released any PC drivers for the device and didn’t seem to be willing to do so in the near future.

images Note Drivers are computer programs that allow other higher-level computer programs to interact with hardware devices. They convert a well-known and predictable API (application programming interface) to the native API built into the hardware, making different devices look and behave similarly. Think of drivers as the translators between the hardware and the application or the operating system using it. Without the proper drivers, the computer wouldn’t know how to communicate with any of the hardware plugged into it!

Hacking the Kinect

Right after the Kinect was released in November 2010, and taking advantage of the Kinect’s open USB connection, Adafruit Industries offered a bounty of $1,000 to anybody who would provide “open source drivers for this cool USB device. The drivers and/or application can run on any operating system—but completely documented and under an open source license.” After some initial negative comments from Microsoft, Adafruit added another $2,000 to the bounty to spice it up.

The winner of the $3,000 bounty was Héctor Martín, who produced Linux drivers that allowed the use of both the RGB camera and the depth image from the Kinect.

The release of the open source drivers stirred a frenzy of Kinect application development that caught the attention of the media. Web sites dedicated to the new world of Kinect applications, such as http://www.kinecthacks.com, mushroomed on the Internet. The amazed public could see a growing number of new applications appear from every corner of the world.

Official Frameworks

But the Kinect hackers were not the only ones to realize the incredible possibilities that the new technology was about to unleash. The companies involved in the design of the Kinect soon understood that the Kinect for Xbox 360 was only a timid first step toward a new technological revolution—and they had no intention of being left behind.

In 2010, PrimeSense (the company behind Kinect’s 3D imaging) released its own drivers and programming framework for the Kinect, called OpenNI. Soon after that, it announced a partnership with ASUS in producing a new Kinect-like device, the Xtion.

In 2011, Microsoft released the non-commercial Kinect SDK (software development kit). In February 2012, it released a commercial version, accompanied by the Kinect for Windows device.

And this is where we stand now. Laptops with integrated Kinect-like cameras are most probably on their way. This is only the beginning of a whole range of hardware and applications using the technology behind Kinect to make computers better understand the world around them.

The Kinect Sensor

The Kinect sensor features an RGB camera, a depth sensor consisting of an infrared laser projector and an infrared CMOS sensor, and a multi-array microphone enabling acoustic source localization and ambient noise suppression. It also contains an LED light, a three-axis accelerometer, and a small servo controlling the tilt of the device.

images Note The infrared CMOS (complementary metal–oxide semiconductor) sensor is an integrated circuit that contains an array of photodetectors that act as an infrared image sensor. This device is also referred to as IR camera, IR sensor, depth image CMOS, or CMOS sensor, depending on the source.

Throughout this book we will focus on the 3D scanning capabilities of the Kinect device accessed through OpenNI/NITE. We won’t be talking about the microphones, the in-built accelerometer, or the servo, which are not accessible from OpenNI because they are not part of PrimeSense’s reference design.

The RGB camera is an 8-bit VGA resolution (640 x 480 pixels) camera. This might not sound very impressive, but you need to remember that the magic happens in the depth sensor, which is completely independent of the RGB camera.

The two depth sensor elements, the IR projector and IR camera, work together with the internal chip from PrimeSense to reconstitute a 3D motion capture of the scene in front of the Kinect (Figures 2-3 and 2-4). They do this by using a technique called structured-light 3D scanning, which we will discuss in depth at the end of this chapter. The IR camera also uses a VGA resolution (640 x 480 pixels) with 11-bit depth, providing 2,048 levels of sensitivity.

images

Figure 2-3. Kinect hardware

images

Figure 2-4. Left to right: Kinect IR camera, RGB camera, LED, and IR projector (photo courtesy of ifixit)

Of course, there is much more going on within the sensor. If you are interested in the guts of the Kinect device, the web site ifixit featured a teardown of the Kinect in November 2010 (http://www.ifixit.com/Teardown/Microsoft-Kinect-Teardown/4066).

Positioning your Kinect

The Kinect’s practical range goes from 1.2m to 3.5m. If the objects stand too close to the sensor, they will not be scanned and will just appear as black spots; if they are too far, the scanning precision will be too low, making them appear as flat objects. If you are using the Kinect for Windows device, the range of the camera is shorter, from 40cm to 3m.

Kinect Capabilities

So what can you do with your Kinect device and all of the hi-tech stuff hidden inside? Once the Kinect is properly installed and communicating with your computer, you will be able to access a series of raw data streams and other capabilities provided by specific middleware. In this book, we will use the OpenNI drivers and NITE middleware.

RGB Image

Yes, you can use the Kinect as a 640 x 480 pixel webcam. You will learn how to access Kinect’s RGB image using OpenNI in Chapter 3.

IR Image

As Kinect has an infrared CMOS, you can also access the 640 x 480 IR image using OpenNI. This is also covered in Chapter 3.

Depth Map

The depth map is the result of the operations performed by PrimeSense’s PS1080 chip on the IR image captured by Kinect’s IR CMOS. This VGA image has a precision of 11 bits, or 2048 different values, represented graphically by levels of grey from white (2048) to black (0).

In Chapter 3, you will learn how to access and display the depth image from Processing using the OpenNI framework and how to translate the gray scale image into real space dimensions. There is one detail that you should take into consideration: the sensor’s distance measurement doesn’t follow a linear scale but a logarithmic scale. Without getting into what a logarithmic scale is, you should know that the precision of the depth sensing is lower on objects further from the Kinect sensor.

Hand and Skeleton Tracking

After the depth map has been generated, you can use it directly for your applications or run it through a specific middleware to extract more complex information from the raw depth map.

In the following chapters, you will be using NITE middleware to add hand/skeleton tracking and gesture recognition to your applications. Chapter 4 will teach you how to use hand tracking to control LED lights through an Arduino board. Chapter 5 will introduce NITE’s gesture recognition, and you will even program your own simple gesture recognition routine. Chapter 6 will teach you how to work with skeleton tracking.

The algorithms that NITE or other middleware use to extract this information from the depth map fall way beyond the scope of this book, but if you are curious, you can read Hacking the Kinect by Jeff Kramer et al (Apress, 2012), in which you will find a whole chapter on gesture recognition.

Kinect Drivers and Frameworks

In order to access the Kinect data streams, you will need to install the necessary drivers on your computer. Because of the rather complicate story of this device, there are a series of choices available that we will detail next.

OpenKinect: Libfreenect Drivers

Soon after creating the Kinect open source drivers for Adafruit, Héctor Martín joined the OpenKinect community (http://openkinect.org), which was created by Josh Blake with the intention of bringing together programmers interested in natural user interfaces (NUIs). OpenKinect develops and maintains the libfreenect core library for accessing the Kinect USB camera. It currently supports access to the RGB and depth images, the Kinect motor, the accelerometer, and the LED. Access to the microphones is being worked on.

PrimeSense: OpenNI and NITE

Throughout this book, we will be using OpenNI and NITE to access the Kinect data streams and the skeleton/hand tracking capabilities, so here is a little more detail on this framework. The Israeli company PrimeSense developed the technology behind Kinect’s 3D imaging and worked with Microsoft in the development of the Kinect device. In December 2010, PrimeSense created an industry-led, not-for-profit organization called OpenNI, which stands for open natural interaction (http://www.openni.org).

This organization was formed to “certify and promote the compatibility and interoperability of natural interaction (NI) devices, applications, and middleware.” The founding members of the OpenNI organization are PrimeSense, Willow Garage, Side-Kick, ASUS, and AppSide.

OpenNI

In order to fulfill its goal, OpenNI released an open source framework called OpenNI Framework. It provides an API and high-level middleware called NITE for implementing hand/skeleton tracking and gesture recognition.

images

Figure 2-5. OpenNI abstract layered view (courtesy of PrimeSense)

Because OpenNI breaks the dependency between the sensor and the middleware (see Figure 2-5), the API enables middleware developers to develop algorithms on top of raw data formats, independent of the sensor device that is producing the data. In the same way, sensor manufacturers can build sensors that will work with any OpenNI-compliant application.

Kinect was the first implementation of the PrimeSense reference design; its optics and microchip were developed entirely by PrimeSense. Then Microsoft added a motor and a three-axis accelerometer to the design. This is why OpenNI doesn’t provide access to the motor or the accelerometer; they are specific to the Kinect implementation.

PrimeSense is a fabless (fabrication-less) semiconductor company. It makes its revenue by selling hardware and semiconductor chips while outsourcing fabrication. It’s mainly a B2B (business to business): it sells solutions to manufacturers who insert these solutions into consumer products. This is exactly the way in which it was involved in the development of the Kinect with Microsoft.

PrimeSense then sells its technology to manufacturers such as ASUS and other computer or television manufacturers. But for this market to be developed, there needs to exist an ecosystem of people creating natural interaction-based content and applications. PrimeSense created OpenNI as a way to empower developers to add natural interaction to their software and applications so this ecosystem would flourish.

NITE

For natural interaction to be implemented, the developer needs more than the 3D point cloud from Kinect. The most useful features come from the skeleton and hand tracking capabilities. Not all developers have the knowledge, time, or resources to develop these capabilities from scratch, as they involve advanced algorithms. PrimeSense decided to implement these capabilities and distribute them for commercial purposes but keep the code closed, and so NITE was developed.

images Note There has been much confusion about the differences between OpenNI and NITE. OpenNI is PrimeSense’s framework; it allows you to acquire the depth and RGB images from the Kinect. OpenNI is open source and for commercial use. NITE is the middleware that allows you to perform hand/skeleton tracking and gesture recognition. NITE is not open source, but it is also distributed for commercial use.

This means that without NITE you can’t use skeleton/hand tracking or gesture recognition, unless you develop your own middleware that processes the OpenNI point cloud data and extracts the joint and gesture information. Without a doubt, there will be other middleware developed by third parties in the future that will compete with OpenNI and NITE for natural interaction applications.

In Chapter 3, you will learn how to download OpenNI and NITE, and you will start using them to develop amazing projects throughout the rest of the book.

Microsoft Kinect for Windows

On June 16, 2011, six months after PrimeSense released its drivers and middleware, Microsoft announced the release of the official Microsoft Kinect SDK for non-commercial use. This SDK offered the programmer access to all the Kinect sensor capabilities plus hand/skeleton tracking. At the time of writing, Kinect for Windows SDK includes the following:

  • Drivers for using Kinect sensor devices on a computer running Windows 7 or Windows 8 developer preview (desktop apps only)
  • APIs and device interfaces, along with technical documentation
  • Source code samples

Unfortunately, the non-commercial license limited the applications to testing or personal use. Also, the SDK only installs on Windows 7, leaving out the Linux and Mac OSX programmer communities. Moreover, the development of applications is limited to C++, C#, or Visual Basic using Microsoft Visual Studio 2010.

These limitations discouraged many developers, who chose to continue to develop applications with OpenNI/NITE plus their OS and developing platform of choice, with an eye towards the commercialization of their applications.

From February 2012, the Kinect for Windows includes a new sensor device specifically designed for use with a Windows-based PC and a new version of the SDK for commercial use. The official Microsoft SDK will continue to support the Kinect for Xbox 360 as a development device.

Kinect Theory

The technique used by PrimeSense’s 3D imaging system in the Kinect is called structured-light 3D scanning. This technique is used in many industrial applications, such as production control and volume measurement, and involves highly accurate and expensive scanners. Kinect is the first device to implement this technique in a consumer product.

Structured-Light 3D Scanning

Most structured-light scanners are based on the projection of a narrow stripe of light onto a 3D object, using the deformation of the stripe when seen from a point of view different from the source to measure the distance from each point to the camera and thus reconstitute the 3D volume. This method can be extended to the projection of many stripes of light at the same time, which provides a high number of samples simultaneously (Figure 2-6).

images

Figure 2-6. Triangulation principles for structured-light 3D scanning (from Wikipedia, http://en.wikipedia.org/wiki/Structured_Light_3D_Scanner, licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license)

The Kinect system is somewhat different. Instead of projecting stripes of visible light, the Kinect’s IR projector sends out a pattern of infrared light beams (called an IR coding image by PrimeSense), which bounces on the objects and is captured by the standard CMOS image sensor (see Figure 2-7). This captured image is passed on to the onboard PrimeSense chip to be translated into the depth image (Figure 2-8).

images

Figure 2-7. Kinnect IR coding image (detail)

images

Figure 2-8. Depth map (left) reconstituted from the light coding infrared pattern (right)

Converting the Light Coding Image to a Depth Map

Once the light coding infrared pattern is received, PrimeSense’s PS1080 chip (Figure 2-9) compares that image to a reference image stored in the chip’s memory as the result of a calibration routine performed on each device during the production process. The comparison of the “flat” reference image and the incoming infrared pattern is translated by the chip into a VGA-sized depth image of the scene that you can access through the OpenNI API.

images

Figure 2-9. PrimeSense PS1080 system on chip (image courtesy of PrimeSense)

Kinect Alternative: ASUS Xtion PRO

After PrimeSense released the OpenNI framework, ASUS and PrimeSense announced their intention to release a PC-compatible device similar to Kinect. In 2012, ASUS revealed the Xtion PRO, an exact implementation of PrimeSense’s reference design that only features a depth camera. It was followed by the Xtion PRO LIVE that, like Kinect, includes an RGB camera well as the infrared camera (Figure 2-10). ASUS claims its device is “the world’s first and exclusive professional PC motion-sensing software development solution” because the Xtion is designed to be used with a PC (unlike the Kinect, which was initially designed for the Xbox 360). ASUS is also creating an online store for Xtion applications where developers will be able to sell their software to users.

images Note ASUS Xtion is OpenNI- and NITE-compatible, which means that all the projects in this book can also be implemented using an Xtion PRO LIVE camera from ASUS!

images

Figure 2-10. ASUS Xtion PRO LIVE (image courtesy of ASUS)

Summary

This chapter provided an overview of the Kinect device: its history and capabilities, as well as its hardware, software, and the technical details behind its 3D scanning. It was not the intention of the authors to give you a detailed introduction to all of the technical aspects of the Kinect device; we just wanted to get you acquainted with the amazing sensor that will allow you to build all the projects in this book—and the ones that you will imagine afterwards.

In the next chapter, you will learn how to install all the software you need to use your Kinect. Then you will implement your first Kinect program. Now you are the controller!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset