Appendix B. Generating Haar Cascades for Custom Targets

This appendix shows how to generate Haar cascade XML files like the ones used in Chapter 4, Tracking Faces with Haar Cascades. By generating our own cascade files, we can potentially track any pattern or object, not just faces. However, good results might not come quickly. We must carefully gather images, configure script parameters, perform real-world tests, and iterate. A lot of human time and processing time might be involved.

Gathering positive and negative training images

Do you know the flashcard pedagogy? It is a method of teaching words and recognition skills to small children. The teacher shows the class a series of pictures and says the following:

"This is a cow. Moo! This is a horse. Neigh!"

The way that cascade files are generated is analogous to the flashcard pedagogy. To learn how to recognize cows, the computer needs positive training images that are pre-identified as cows and negative training images that are pre-identified as non-cows. Our first step, as trainers, is to gather these two sets of images.

When deciding how many positive training images to use, we need to consider the various ways in which our users might view the target. The ideal, simplest case is that the target is a 2D pattern that is always on a flat surface. In this case, one positive training image might be enough. However, in other cases, hundreds or even thousands of training images might be required. Suppose that the target is your country's flag. When printed on a document, the flag might have a predictable appearance but when printed on a piece of fabric that is blowing in the wind, the flag's appearance is highly variable. A natural, 3D target, such as a human face, might range even more widely in appearance. Ideally, our set of positive training images should be representative of the many variations our camera may capture. Optionally, any of our positive training images may contain multiple instances of the target.

For our negative training set, we want a large number of images that do not contain any instances of the target but do contain other things that our camera is likely to capture. For example, if a flag is our target, our negative training set might include photos of the sky in various weather conditions. (The sky is not a flag but is often seen behind a flag.) Do not assume too much though. If the camera's environment is unpredictable and the target occurs in many settings, use a wide variety of negative training images. Consider building a set of generic environmental images that you can reuse across multiple training scenarios.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset