The challenge in using OpenCV's Haar cascade classifiers is not just getting a tracking result; it is getting a series of sensible tracking results at a high frame rate. One kind of common sense that we can enforce is that certain tracked objects should have a hierarchical relationship, one being located relative to the other. For example, a nose should be in the middle of a face. By attempting to track both a whole face and parts of a face, we can enable application code to do more detailed manipulations and to check how good a given tracking result is. A face with a nose is a better result than one without. At the same time, we can support some optimizations, such as only looking for faces of a certain size and noses in certain places.
We are going to implement an optimized, hierarchical tracker in a class called FaceTracker
, which offers a simple interface. A FaceTracker
may be initialized with certain optional configuration arguments that are relevant to the tradeoff between tracking accuracy and performance. At any given time, the latest tracking results of FaceTracker
are stored in a property called faces
, which is a list of Face
instances. Initially, this list is empty. It is refreshed via an update()
method that accepts an image for the tracker to analyze. Finally, for debugging purposes, the rectangles of faces
may be drawn via a drawDebugRects()
method, which accepts an image as a drawing surface. Every frame, a real-time face-tracking application would call update()
, read faces
, and perhaps call drawDebugRects()
.
Internally, FaceTracker
uses an OpenCV class called CascadeClassifier
. A CascadeClassifier
is initialized with a cascade data file, such as the ones that we found and copied earlier. For our purposes, the important method of CascadeClassifier
is detectMultiScale()
, which performs tracking that may be robust to variations in scale. The possible arguments to detectMultiScale()
are:
image
: This is an image to be analyzed. It must have 8 bits per channel.scaleFactor
: This scaling factor separates the window sizes in two successive passes. A higher value improves performance but diminishes robustness with respect to variations in scale.minNeighbors
: This value is one less than the minimum number of regions that are required in a match. (A match may merge multiple neighboring regions.)flags
: There are several flags but not all combinations are valid. The valid standalone flags and valid combinations include:cv2.cv.CV_HAAR_SCALE_IMAGE
: Scales each windowed image region to match the feature data. (The default approach is the opposite: scale the feature data to match the window.) Scaling the image allows for certain optimizations on modern hardware. This flag must not be combined with others.cv2.cv.CV_HAAR_DO_CANNY_PRUNING
: Eagerly rejects regions that contain too many or too few edges to match the object type. This flag should not be combined with cv2.cv.CV_HAAR_FIND_BIGGEST_OBJECT
.cv2.cv.CV_HAAR_FIND_BIGGEST_OBJECT
: Accepts, at most, one match (the biggest).cv2.cv.CV_HAAR_FIND_BIGGEST_OBJECT
| cv2.cv.HAAR_DO_ROUGH SEARCH
: Accepts, at most, one match (the biggest) and skips some steps that would refine (shrink) the region of this match. The minNeighbors
argument should be greater than 0
.minSize
: A pair of pixel dimensions representing the minimum object size being sought. A higher value improves performance.maxSize
: A pair of pixel dimensions representing the maximum object size being sought. A lower value improves performance.The return value of detectMultiScale()
is a list of matches, each expressed as a rectangle in the format [x, y, w, h]
.
Similarly, the initializer of FaceTracker
accepts scaleFactor
, minNeighbors
, and flags
as arguments. The given values are passed to all detectMultiScale()
calls that a FaceTracker
makes internally. Also during initialization, a FaceTracker
creates CascadeClassifiers
using face, eye, nose, and mouth data. Let's add the following implementation of the initializer and the faces
property to trackers.py
:
class FaceTracker(object): """A tracker for facial features: face, eyes, nose, mouth.""" def __init__(self, scaleFactor = 1.2, minNeighbors = 2, flags = cv2.cv.CV_HAAR_SCALE_IMAGE): self.scaleFactor = scaleFactor self.minNeighbors = minNeighbors self.flags = flags self._faces = [] self._faceClassifier = cv2.CascadeClassifier( 'cascades/haarcascade_frontalface_alt.xml') self._eyeClassifier = cv2.CascadeClassifier( 'cascades/haarcascade_eye.xml') self._noseClassifier = cv2.CascadeClassifier( 'cascades/haarcascade_mcs_nose.xml') self._mouthClassifier = cv2.CascadeClassifier( 'cascades/haarcascade_mcs_mouth.xml') @property def faces(self): """The tracked facial features.""" return self._faces
The update()
method of FaceTracker
first creates an equalized, grayscale variant of the given image. Equalization, as implemented in OpenCV's equalizeHist()
function, normalizes an image's brightness and increases its contrast. Equalization as a preprocessing step makes our tracker more robust to variations in lighting, while conversion to grayscale improves performance. Next, we feed the preprocessed image to our face classifier. For each matching rectangle, we search certain subregions for a left and right eye, nose, and mouth. Ultimately, the matching rectangles and subrectangles are stored in Face
instances in faces
. For each type of tracking, we specify a minimum object size that is proportional to the image size. Our implementation of FaceTracker
should continue with the following code for update()
:
def update(self, image): """Update the tracked facial features.""" self._faces = [] if utils.isGray(image): image = cv2.equalizeHist(image) else: image = cv2.cvtColor(image, cv2.cv.CV_BGR2GRAY) cv2.equalizeHist(image, image) minSize = utils.widthHeightDividedBy(image, 8) faceRects = self._faceClassifier.detectMultiScale( image, self.scaleFactor, self.minNeighbors, self.flags, minSize) if faceRects is not None: for faceRect in faceRects: face = Face() face.faceRect = faceRect x, y, w, h = faceRect # Seek an eye in the upper-left part of the face. searchRect = (x+w/7, y, w*2/7, h/2) face.leftEyeRect = self._detectOneObject( self._eyeClassifier, image, searchRect, 64) # Seek an eye in the upper-right part of the face. searchRect = (x+w*4/7, y, w*2/7, h/2) face.rightEyeRect = self._detectOneObject( self._eyeClassifier, image, searchRect, 64) # Seek a nose in the middle part of the face. searchRect = (x+w/4, y+h/4, w/2, h/2) face.noseRect = self._detectOneObject( self._noseClassifier, image, searchRect, 32) # Seek a mouth in the lower-middle part of the face. searchRect = (x+w/6, y+h*2/3, w*2/3, h/3) face.mouthRect = self._detectOneObject( self._mouthClassifier, image, searchRect, 16) self._faces.append(face)
Note that update()
relies on utils.isGray()
and utils.widthHeightDividedBy()
, both implemented earlier in this chapter. Also, it relies on a private helper method, _detectOneObject()
, which is called several times in order to handle the repetitious work of tracking several subparts of the face. As arguments, _detectOneObject()
requires a classifier, image, rectangle, and minimum object size. The rectangle is the image subregion that the given classifier should search. For example, the nose classifier should search the middle of the face. Limiting the search area improves performance and helps eliminate false positives. Internally, _detectOneObject()
works by running the classifier on a slice of the image and returning the first match (or None
if there are no matches). This approach works whether or not we are using the cv2.cv.CV_HAAR_FIND_BIGGEST_OBJECT
flag. Our implementation of FaceTracker
should continue with the following code for _detectOneObject()
:
def _detectOneObject(self, classifier, image, rect, imageSizeToMinSizeRatio): x, y, w, h = rect minSize = utils.widthHeightDividedBy( image, imageSizeToMinSizeRatio) subImage = image[y:y+h, x:x+w] subRects = classifier.detectMultiScale( subImage, self.scaleFactor, self.minNeighbors, self.flags, minSize) if len(subRects) == 0: return None subX, subY, subW, subH = subRects[0] return (x+subX, y+subY, subW, subH)
Lastly, FaceTracker
should offer basic drawing functionality so that its tracking results can be displayed for debugging purposes. The following method implementation simply defines colors, iterates over Face
instances, and draws rectangles of each Face
to a given image using our rects.outlineRect()
function:
def drawDebugRects(self, image): """Draw rectangles around the tracked facial features.""" if utils.isGray(image): faceColor = 255 leftEyeColor = 255 rightEyeColor = 255 noseColor = 255 mouthColor = 255 else: faceColor = (255, 255, 255) # white leftEyeColor = (0, 0, 255) # red rightEyeColor = (0, 255, 255) # yellow noseColor = (0, 255, 0) # green mouthColor = (255, 0, 0) # blue for face in self.faces: rects.outlineRect(image, face.faceRect, faceColor) rects.outlineRect(image, face.leftEyeRect, leftEyeColor) rects.outlineRect(image, face.rightEyeRect, rightEyeColor) rects.outlineRect(image, face.noseRect, noseColor) rects.outlineRect(image, face.mouthRect, mouthColor)
Now, we have a high-level tracker that hides the details of Haar cascade classifiers while allowing application code to supply new images, fetch data about tracking results, and ask for debug drawing.