Most objects' detection problems, such as face/person detection or lesion detection in medicine, require searching for the object in many image patches. However, examining all image zones and computing the feature set for each zone are time-consuming tasks. Cascade detectors are widely used because of their high efficiency in doing this.
Cascade detectors consist of various boosting stages. The boosting algorithm selects the best feature set to create and combine a number of weak tree classifiers. Thus, boosting is not only a detector but also a feature selection method. Each stage is usually trained to detect nearly 100 percent of objects correctly and discard at least 50 percent of the background images. Therefore, background images, which represent a larger number of images, need less processing time as they are discarded at the early stages of the cascade. Moreover, the concluding cascade stages use more features than earlier stages, and even then only objects and difficult background images require more time to be evaluated.
Discrete AdaBoost (Adaptive Boosting), Real AdaBoost, Gentle AdaBoost, and LogitBoost are all implemented in OpenCV as boosting stages. On the other hand, it is possible to use Haar-like, Local Binary Patterns (LBP) and Histograms of Oriented Gradients (HOG) features together with the different boosting algorithms.
All these advantages and available techniques make cascades very useful for building practical detection applications.
OpenCV comes with several pretrained cascade detectors for the most common detection problems. They are located under the OPENCV_SOURCEdata
directory. The following is a list of some of them and their corresponding subdirectories:
haarcascades
:haarcascade_frontalface_default.xml
haarcascade_eye.xml
haarcascade_mcs_nose.xml
haarcascade_mcs_mouth.xml
haarcascade_upperbody.xml
haarcascade_lowerbody.xml
haarcascade_fullbody.xml
lbpcascades
:lbpcascade_frontalface.xml
lbpcascade_profileface.xml
lbpcascade_silverware.xml
hogcascades
:hogcascade_pedestrians.xml
The following pedestrianDetection
example serves to illustrate how to use a cascade detector and localize pedestrians in a video file with OpenCV:
#include "opencv2/core/core.hpp" #include "opencv2/objdetect/objdetect.hpp" #include "opencv2/highgui/highgui.hpp" #include "opencv2/imgproc/imgproc.hpp" #include <iostream> using namespace std; using namespace cv; int main(int argc, char *argv[]){ CascadeClassifier cascade(argv[1]); if (cascade.empty()) return -1; VideoCapture vid(argv[2]); if (!vid.isOpened()){ cout<<"Error. The video cannot be opened."<<endl; return -1; } namedWindow("Pedestrian Detection"); Mat frame; while(1) { if (!vid.read(frame)) break; Mat frame_gray; if(frame.channels()>1){ cvtColor( frame, frame_gray, CV_BGR2GRAY ); equalizeHist( frame_gray, frame_gray ); }else{ frame_gray = frame; } vector<Rect> pedestrians; cascade.detectMultiScale( frame_gray, pedestrians, 1.1, 2, 0, Size(30, 30), Size(150, 150) ); for( size_t i = 0; i < pedestrians.size(); i++ ) { Point center( pedestrians[i].x + pedestrians[i].width*0.5, pedestrians[i].y + pedestrians[i].height*0.5 ); ellipse( frame, center, Size( pedestrians[i].width*0.5, pedestrians[i].height*0.5), 0, 0, 360, Scalar( 255, 0, 255 ), 4, 8, 0 ); } imshow("Pedestrian Detection", frame); if(waitKey(100) >= 0) break; } return 0; }
The code explanation is as follows:
CascadeClassifier
: This class provides all the methods needed when working with cascades. An object from this class represents a trained cascade detector.constructor CascadeClassifier:: CascadeClassifier(const string& filename)
: This class initializes the object instance and loads the information of the cascade detector stored in the system file indicated by the variable filename
.bool CascadeClassifier:: empty()
: This method checks if a cascade detector has been loaded.cvtColor
and equalizeHist
: These methods are required for image grayscale conversion and equalization. Since the cascade detector is trained with grayscale images and input images can be in different formats, it is necessary to convert them to the correct color space and equalize their histograms in order to obtain better results. This is done by the following code that uses the cvtColor
and equalizeHist
functions:Mat frame_gray; if(frame.channels()>1){ cvtColor( frame, frame_gray, CV_BGR2GRAY ); equalizeHist( frame_gray, frame_gray ); }else{ frame_gray = frame; }
void CascadeClassifier::detectMultiScale(const Mat& image, vector<Rect>& objects, double scaleFactor=1.1, int minNeighbors=3, int flags=0, Size minSize=Size(), Size maxSize=Size())
: This method examines the image in the image
variable applying the loaded cascade and insert all detected objects in objects
. Detections are stored in a vector of rectangles of type Rect
. The parameters scaleFactor
and minNeighbors
indicates how much the image size is reduced at each image scale considered and the minimum number of neighbors that indicate a positive detection. Detections are bound by the minimum and maximum sizes, indicated by minSize
and maxSize
. Finally, the parameter flags
is not used when using cascades created with opencv_traincascade
.The following screenshot shows the result of applying the hogcascade_pedestrians.xml
pretrained HOG-based pedestrian detector over the frames of the 768x576.avi
video, which is stored in the OPENCV_SCR/samples
folder.
There are several projects and contributions to the OpenCV community that solve other detection-related problems that involve not only detecting the object but also distinguishing its state. One example of this type of detectors is the smile detector included in OpenCV since Version 2.4.4. The code can be found in the file OPENCV_SCR/samples/c/smiledetect.cpp
, and the XML that stores the cascade detector, haarcascade_smile.xml
, can be found in OPENCV_SCR/data/haarcascades
. This code first detects the frontal face using the pretrained cascade stored in haarcascade_frontalface_alt.xml
and then detects the smiling mouth pattern at the bottom part of the image. Finally, the intensity of the smile is calculated based on the number of neighbors detected.
Although OpenCV provides pretrained cascades, in some cases it is necessary to train a cascade detector to look for a specific object. For these cases, OpenCV comes with tools to help train a cascade, generating all the data needed during the training process and the final files with the detector information. These are usually stored in the OPENCV_BUILDinstallx64mingwin
directory. Some of the applications are listed as follows:
opencv_haartraining
: This application is historically the first version of the application for creating cascades.opencv_traincascade
: This application is the latest version of the application for creating cascades.opencv_createsamples
: This application is used to create the .vec
file with the images that contain instances of the object. The file generated is accepted by both the preceding training executables.opencv_performance
: This application may be used to evaluate a cascade trained with the opencv_haartraining
tool. It uses a set of marked images to obtain information about the evaluation, for example, the false alarm or the detection rates.Since opencv_haartraining
is the older version of the program and it comes with fewer features than opencv_traincascade
, only the latter will be described here.
Here, the cascade training process is explained using the MIT CBCL face database. This database contains face and background images of 19 x 19 pixels arranged as shown in the following screenshot:
This section explains the training process on Windows. For Linux and Mac OS X, the process is similar but takes into account the specific aspects of the operating system. More information on training cascade detectors in Linux and Mac OS X can be found at http://opencvuser.blogspot.co.uk/2011/08/creating-haar-cascade-classifier-aka.html and http://kaflurbaleen.blogspot.co.uk/2012/11/how-to-train-your-classifier-on-mac.html respectively.
The training process involves the following steps:
C:chapter6images
, use the following command:>cd C:chapter6images
C:chapter6images rain
on-face
and their format is .pgm
, it is possible to create the text file required by OpenCV using the following command:>for %i in (C:chapter6images rain
on-face*.pgm) do @echo %i >> train_non-face.txt
The following screenshot shows the contents of the background image information file. This file contains the path of the background images:
.dat
file with the object coordinates. In this particular database, object images only contain one instance of the object and it is located in the center of the image and scaled to occupy the entire image. Therefore, the number of objects per image is 1 and the object coordinates are 0 0 19 19
, which are the initial point and the width and height of the rectangle that contains the object.If object images are stored in C:chapter6images rainface
, it is possible to use the following command to generate the file:
>for %i in (C:chapter6images rainface*.pgm) do @echo %i 1 0 0 19 19 >> train_face.dat
The content of the .dat
file can be seen in the following screenshot:
.dat
file with the object coordinates, it is necessary to create the .vec
file that is needed by OpenCV. This step can be performed using the opencv_createsamples
program with the arguments –info
(.dat
file); -vec
(.vec
output file name); -num
(number of images); -w
and –h
(output image width and height); and –maxxangle
, -maxyangle
, and -maxzangle
(image rotation angles). To see more options, execute opencv_createsamples
without arguments. In this case, the command used is:>opencv_createsamples -info train_face.dat -vec train_face.vec -num 2429 -w 19 -h 19 -maxxangle 0 -maxyangle 0 -maxzangle 0
opencv_traincascade
executable and train the cascade detector. The command used in this case is:>opencv_traincascade -data C:chapter6 rainedCascade -vec train_face.vec -bg train_non-face.txt -numPos 242 -numNeg 454 -numStages 10 -w 19 -h 19
The arguments indicate the output directory (-data
), the .vec
file (-vec
), the background information file (-bg
), the number of positive and negative images to train each stage (-numPos
and –numNeg
), the maximum number of stages (-numStages
), and the width and height of the images (-w
and –h
).
The output of the training process is:
PARAMETERS: cascadeDirName: C:chapter6 rainedCascade vecFileName: train_face.vec bgFileName: train_non-face.txt numPos: 242 numNeg: 454 numStages: 10 precalcValBufSize[Mb] : 256 precalcIdxBufSize[Mb] : 256 stageType: BOOST featureType: HAAR sampleWidth: 19 sampleHeight: 19 boostType: GAB minHitRate: 0.995 maxFalseAlarmRate: 0.5 weightTrimRate: 0.95 maxDepth: 1 maxWeakCount: 100 mode: BASIC ===== TRAINING 0-stage ===== <BEGIN POS count : consumed 242 : 242 NEG count : acceptanceRatio 454 : 1 Precalculation time: 4.524 +----+---------+---------+ | N | HR | FA | +----+---------+---------+ | 1| 1| 1| +----+---------+---------+ | 2| 1| 1| +----+---------+---------+ | 3| 0.995868| 0.314978| +----+---------+---------+ END> Training until now has taken 0 days 0 hours 0 minutes 9 seconds. . . . Stages 1, 2, 3, and 4 . . . ===== TRAINING 5-stage ===== <BEGIN POS count : consumed 242 : 247 NEG count : acceptanceRatio 454 : 0.000220059 Required leaf false alarm rate achieved. Branch training terminated.
Finally, the XML files of the cascade are stored in the output directory. These files are cascade.xml
, params.xml
, and a set of stageX.xml
files where X
is the stage number.