There's more...

We discussed region-based techniques for object detection in the There's more... section of the previous recipe, Object localization, of this chapter. However, numerous techniques are widely used specifically for human face localization.

Let's look at how some of these techniques work:

HOG and SVM: They are used as a descriptor for image detection and work seamlessly with varying illumination backgrounds and pose changes. In this technique, the image is divided into 8×8 cells, and then the distribution of magnitudes and the directions of local intensity gradients over the pixels are obtained. Pixels with a large negative change in gradient will be black, pixels with large positive change will be white, and pixels with little or no change will be gray. Each cell is divided into angular bins that correspond to the gradient's direction (0 - 180 degrees for unsigned gradients and 0 - 360 degrees for signed gradients), thus condensing the vectors of size 64(8×8) to just 9 values (in the case of 0-180 degrees) associated with the respective bins. HOG uses a sliding window to compute HOG descriptors for each cell in the image and takes care of scaling issues via image pyramiding. These HOG features, combined with SVM classifiers, are then used for the recognition of human faces.

Haar cascade classifiers: Haar cascades work pretty well when it comes to detecting one particular type of object in an image, such as a face in an image, eyes in an image, and so on. However, they can be used in parallel to detect faces, eyes, and mouths. The algorithm is trained on lots of positive images (images that contain faces) and negative images (images that do not contain faces) and then features are extracted from them over a given base window size (24×24 in the case of the Viola-Jones algorithm). Haar features are like convolution kernels since they detect the presence of that particular feature in a given image, and each feature represents a part of the human face. Each feature result is used to calculate a value by subtracting the sum of pixels under a white rectangle from the sum of pixels under a black rectangle. There are thousands of features that are calculated during this process; however, not all of them may be useful for face detection.

A new image representation approach, known as the integral image, is used to reduce this number of features. The AdaBoost algorithm is then used to get rid of redundant features and select only the relevant ones. Then, a weighted combination of all these features is used to decide whether a given window has a face or not. Instead of using all the selected features to slide over an image, the idea of cascading is used, where all the relevant features are sampled into different cascades in a linear manner. If cascade i is able to detect a face in a window, then the image is passed on to the next cascade, i+1; otherwise, it is discarded. Cascading classifiers reduces a lot of computation complexity and time. With this approach, we can use any supervised learning technique on top and train for facial recognition.

The following screenshot shows a few Haar cascade features:

Maximum margin of detection (MMOD): With the non-maximum suppression technique, sometimes, the overlapping windows get rejected and lead to false alarms. The maximum margin of detection works by replacing this technique with a new objective function. Unlike other classifiers, MMOD does not perform any sub-sampling; instead, it optimizes the overall sub-windows of an image. In this technique, a maximum margin approach is taken, which requires the label for each training sample to be predicted correctly with a large margin.

Table of Contents for There's more...

Create new playlist

Sign In

Sign Up

Table of Contents for
There's more...