Once your cascade classifier object model is trained, you can use it to detect instances of the same object class in new input images, which are supplied to the system. However, once you apply your object model, you will notice that there are still false positive detections and objects that are not found. This section will cover techniques to improve your detection results, by removing, for example, most of the false positive detections with scene-specific knowledge.
If you apply an object model to a given input image, you must consider several things. Let's first take a look at the detection function and some of the parameters that can be used to filter out your detection output. OpenCV 3 supplies three possible interfaces. We will discuss the benefits of using each one of them.
Interface 1:
void CascadeClassifier::detectMultiScale(InputArray image, vector<Rect>& objects, double scaleFactor=1.1, int minNeighbors=3, int flags=0, Size minSize=Size(), Size maxSize=Size())
The first interface is the most basic one. It allows you to fast evaluate your trained model on a given test image. There are several elements on this basic interface that will allow you to manipulate the detection output. We will discuss these parameters in some more detail and highlight some points of attention when selecting the correct value.
scaleFactor is the scale step used to downscale the original image in order to create the image pyramid, which allows us to perform multiscale detections using only a single scale model. One downside is that this doesn't allow you to detect objects that are smaller than the object size. Using a value of 1.1 means that in each step the dimensions are reduced by 10% compared to the previous step.
A second interesting parameter to adapt for your needs is the minNeighbors
parameter. It describes how many overlapping detections occur due to the sliding window approach. Each detection overlapping by more than 50% with another will be merged together as a sort of nonmaxima suppression.
Use the minSize
and maxSize
parameters to effectively reduce the scale space pyramid. In an industrial setup with, for example, a fixed camera position, such as a conveyor belt setup, you can in most cases guarantee that objects will have certain dimensions. Adding scale values in this case and thus defining a scale range will decrease processing time for a single image a lot, by removing undesired scale levels. As an extra advantage, all false positive detections on those undesired scales will also disappear. If you leave these values blank, the algorithm will start building the image pyramid at input image dimensions, in a bottom-up manner, downscale in steps equaling the scale percentage, until one of the dimensions is smaller than the largest object dimension. This will be the top of the image pyramid, which is also the place where later, at the detection time, the detection algorithm will start running its object detector.
Interface 2:
void CascadeClassifier::detectMultiScale(InputArray image, vector<Rect>& objects, vector<int>& numDetections, double scaleFactor=1.1, int minNeighbors=3, int flags=0, Size minSize=Size(), Size maxSize=Size())
The second interface brings a small addition, by adding the numDetections
parameter. This allows you to put the minNeighbors
value on 1, applying the merging of overlapping windows as nonmaxima suppression, but at the same time returning you a value of the overlapping windows, which were merged. This value can be seen as a certainty score of your detection. The higher the value, the better or the more certain the detection.
Interface 3:
void CascadeClassifier::detectMultiScale(InputArray image, std::vector<Rect>& objects, std::vector<int>& rejectLevels, std::vector<double>& levelWeights, double scaleFactor=1.1, int minNeighbors=3, int flags=0, Size minSize=Size(), Size maxSize=Size(), bool outputRejectLevels=false )
A downside of this interface is that 100 windows with a very small certainty of detection on an individual basis can simply out rule a single detection with a very high individual certainty of detection. This is where the third interface can bring us the solution. It allows us to look at the individual scores of each detection window (described by the threshold value of the last stage of the classifier). You can then grab all those values and threshold the certainty score of those individual windows. When applying nonmaxima suppression in this case, the threshold values of all overlapping windows are combined.
Software illustrating the two most used interfaces for object detection can be found at https://github.com/OpenCVBlueprints/OpenCVBlueprints/tree/master/chapter_5/source_code/detect_simple and https://github.com/OpenCVBlueprints/OpenCVBlueprints/tree/master/chapter_5/source_code/detect_score. OpenCV detection interfaces change frequently and that it is possible that new interfaces are already available which are not discussed here.
Once you have chosen the most appropriate way of retrieving the object detections for your application, you can evaluate the proper output of your algorithm. Two of the most common problems found after training an object detector are:
The reason for the first problem can be explained by looking at the generic object model that we trained for the object class based on positive training samples of that object class. This lets us conclude that the training either:
The second problem is quite normal and happens more than often. It is impossible to supply enough negative samples and at the same time ensure that there will not be a single negative window that could still yield a positive detection at a first run. This is mainly due to the fact that it is very hard for us humans to understand how the computer sees an object based on features. On the other hand, it is impossible to grasp every possible scenario (lighting conditions, interactions during the production process, filth on the camera, and so on) at the very start when training an object detector. You should see the creation of a good and stable model as an iterative process.
In order to reduce the amount of false positive detections, we generally need to add more negative samples. However, it is important not to add randomly generated negative windows, since the extra knowledge that they would bring to the model would, in most cases, simply be minimal. It is better to add meaningful negative windows that can increase the quality of the detector. This is known as hard negative mining using a bootstrapping process. The principle is rather simple:
This will ensure that the accuracy of your model goes up by a fair and decent amount depending on the quality of your negative images.
When adding the found extra and useful negative samples, add them to the top of your background.txt
file! This forces the OpenCV training interface to first grab these more important negative samples before sampling all the standard negative training images provided! Be sure that they have exactly the required model size so that they can only be used once as a negative training sample.