Chapter 7. Learning to Recognize Emotions on Faces

We previously familiarized ourselves with the concepts of object detection and object recognition, but we never combined them to develop an app that can do both end-to-end. For the final chapter in this book, we will do exactly that.

The goal of this chapter is to develop an app that combines both face detection and face recognition, with a focus on recognizing emotional expressions in the detected face.

For this, we will touch upon two other classic algorithms that come bundled with OpenCV: Haar Cascade Classifiers and multi-layer peceptrons (MLPs). While the former can be used to rapidly detect (or locate, answering the question: where?) objects of various sizes and orientations in an image, the latter can be used to recognize them (or identify, answering the question: what?).

The end goal of the app will be to detect your own face in each captured frame of a webcam live stream and label your emotional expression. To make this task feasible, we will limit ourselves to the following possible emotional expressions: neutral, happy, sad, surprised, angry, and disgusted.

To arrive at such an app, we need to solve the following two challenges:

  • Face detection: We will use the popular Haar cascade classifier by Viola and Jones, for which OpenCV provides a whole range of pre-trained exemplars. We will make use of face cascades and eye cascades to reliably detect and align facial regions from frame to frame.
  • Facial expression recognition: We will train a multi-layer perceptron to recognize the six different emotional expressions listed earlier, in every detected face. The success of this approach will crucially depend on the training set that we assemble, and the preprocessing that we choose to apply to each sample in the set. In order to improve the quality of our self-recorded training set, we will make sure that all data samples are aligned using affine transformations and reduce the dimensionality of the feature space by applying Principal Component Analysis (PCA). The resulting representation is sometimes also referred to as Eigenfaces.

The reliable recognition of faces and facial expressions is a challenging task for artificial intelligence, yet humans are able to perform these kinds of tasks with apparent ease. Today's state-of-the-art models range all the way from 3D deformable face models fitting over convolutional neural networks, to deep learning algorithms. Granted, these approaches are significantly more sophisticated than our approach. Yet, MLPs are classic algorithms that helped transform the field of machine learning, so for educational purposes, we will stick to a set of algorithms that come bundled with OpenCV.

We will combine the algorithms mentioned earlier in a single end-to-end app that annotates a detected face with the corresponding facial expression label in each captured frame of a video live stream. The end result might look something like the following image, capturing my reaction when the code first compiled:

Learning to Recognize Emotions on Faces

Planning the app

The final app will consist of a main script that integrates the process flow end-to-end, from face detection to facial expression recognition, as well as some utility functions to help along the way.

Thus, the end product will require several components:

  • chapter7: The main script and entry-point for the chapter.
  • chapter7.FaceLayout: A custom layout based on gui.BaseLayout that operates in two different modes:
    • Training mode: In the training mode, the app will collect image frames, detect a face therein, assign a label depending on the facial expression, and upon exiting, save all the collected data samples in a file, so that it can be parsed by datasets.homebrew.
    • Testing mode: In the testing mode, the app will detect a face in each video frame and predict the corresponding class label by using a pre-trained MLP.
  • chapter3.main: The main function routine to start the GUI application.
  • detectors.FaceDetector: A class for face detection.
    • detect: A method to detect faces in a grayscale image. Optionally, the image is downscaled for better reliability. Upon successful detection, the method returns the extracted head region.
    • align_head: A method to preprocess an extracted head region with affine transformations such that the resulting face appears centered and upright.
  • classifiers.Classifier: An abstract base class that defines the common interface for all classifiers (same as in Chapter 6, Learning to Recognize Traffic Signs).
  • classifiers.MultiLayerPerceptron: A class that implements an MLP by using the following public methods:
    • fit: A method to fit the MLP to the training data. It takes as input, a matrix of the training data, where each row is a training sample, and columns contain feature values, and a vector of labels.
    • evaluate: A method to evaluate the MLP by applying it to some test data after training. It takes as input, a matrix of test data, where each row is a test sample and columns contain feature values, and a vector of labels. The function returns three different performance metrics: accuracy, precision, and recall.
    • predict: A method to predict the class labels of some test data. We expose this method to the user so it can be applied to any number of data samples, which will be helpful in the testing mode, when we do not want to evaluate the entire dataset, but instead predict the label of only a single data sample.
    • save: A method to save a trained MLP to file.
    • load: A method to load a pre-trained MLP from file.
  • train_test_mlp: A script to train and test an MLP by applying it to our self-recorded dataset. The script will explore different network architectures and store the one with the best generalization performance in a file, so that the pre-trained classifier can be loaded later.
  • datasets.homebrew: A class to parse the self-recorded training set. Analogously to the previous chapter, the class contains the following methods:
    • load_data: A method to load the training set, perform PCA on it via the extract_features function, and split the data into the training and test sets. Optionally, the preprocessed data can be stored in a file so that we can load it later on without having to parse the data again.
    • load_from_file: A method to load a previously stored preprocessed dataset.
    • extract_features: A method to extract a feature of choice (in the present chapter: to perform PCA on the data). We expose this function to the user so it can be applied to any number of data samples, which will be helpful in the testing mode, when we do not want to parse the entire dataset but instead predict the label of only a single data sample.
  • gui: A module providing a wxPython GUI application to access the capture device and display the video feed. This is the same module that we used in the previous chapters.
    • gui.BaseLayout: A generic layout from which more complicated layouts can be built. This chapter does not require any modifications to the basic layout.

In the following sections, we will discuss these components in detail.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.