Chapter 5. Generic Object Detection for Industrial Applications

This chapter will introduce you to the world of generic object detection, with a closer look at the advantages that industrial applications yield compared to the standard academic research cases. As many of you will know, OpenCV 3 contains the well-known Viola and Jones algorithm (embedded as the CascadeClassifier class), which was specifically designed for robust face detection. However, the same interface can efficiently be used to detect any desired object class that suits your needs.

Note

More information on the Viola and Jones algorithm can be found in the following publication:

Rapid object detection using a boosted cascade of simple features, Viola P. and Jones M., (2001). In Computer Vision and Pattern Recognition, 2001 (CVPR 2001). Proceedings of the 2001 IEEE Computer Society Conference on (Vol. 1, pp. I-511). IEEE.

This chapter assumes that you have a basic knowledge of the cascade classification interface of OpenCV 3. If not, here are some great starting points for understanding this interface and the basic usage of the supplied parameters and software:

Note

Or you can simply read one of the PacktPub books that discuss this topic in more detail such as Chapter 3, Training a Smart Alarm to Recognize the Villain and His Cat, of the OpenCV for Secret Agents book by Joseph Howse.

In this chapter, I will take you on a tour through specific elements that are important when using the Viola and Jones face detection framework for generic object detection. You will learn how to adapt your training data to the specific situation of your setup, how to make your object detection model rotation invariant, and you will find guidelines on how to improve the accuracy of your detector by smartly using environment parameters and situational knowledge. We will dive deeper into the actual object class model and explain what happens, combined with some smart tools for visualizing the actual process of object detection. Finally, we will look at GPU possibilities, which will lead to faster processing times. All of this will be combined with code samples and example use cases of general object detection.

Difference between recognition, detection, and categorization

For completely understanding this chapter, it is important that you understand that the Viola and Jones detection framework based on cascade classification is actually an object categorization technique and that it differs a lot from the concept of object recognition. This leads to a common mistake in computer vision projects, where people do not analyze the problem well enough beforehand and thus wrongfully decide to use this technique for their problems. Take into consideration the setup described in the following figure, which consists of a computer with a camera attached to it. The computer has an internal description of four objects (plane, cup, car, and snake). Now, we consider the case where three new images are supplied to the camera of the system.

Difference between recognition, detection, and categorization

A simple computer vision setup

In the case that image A is presented to the system, the system creates a description of the given input image and tries to match it with the descriptions of the images in the computer memory database. Since that specific cup is in a slightly rotated position, the descriptor of the cup's memory image will have a closer match than the other object images in memory and thus this system is able to successfully recognize the known cup. This process is called object recognition, and is applied in cases where we know exactly which object we want to find in our input image.

 

"The goal of object recognition is to match (recognize) a specific object or scene. Examples include recognizing a specific building, such as the Pisa Tower, or a specific painting, such as the Mona Lisa. The object is recognized despite changes in scale, camera viewpoint, illumination conditions and partial occlusion."

 
 --Andrea Vedaldi and Andrew Zisserman

However, this technique has some downsides. If an object is presented to the system that doesn't have a description in the image database, the system will still return the closest match and thus the result could be very misleading. To avoid this we tend to put a threshold on the matching quality. If the threshold is not reached, we simply do not provide a match.

When image B is presented to the same system, we experience a new problem. The difference between the the given input image and the cup image in memory is so large (different size, different shape, different print, and so on) that the descriptor of image B will not be matched to the description of the cup in memory, again a large downside of object recognition. The problems even rise further, when image C is presented to the system. There, the known car from computer memory is presented to the camera system, but it is presented in a completely different setup and background than the one in memory. This could lead to the background influencing the object descriptor so much that the object is not recognized anymore.

Object detection goes a bit further; it tries to find a given object in varying setups by learning a more object specific description instead of just a description of the image itself. In a situation where the detectable object class becomes more complex, and the variation of an object is large over several input images—we are no longer talking about single object detection, but rather about detecting a class of objects—this is where object categorization comes into play.

With object categorization, we try to learn a generic model for the object class that can handle a lot of variation inside the object class, as shown in the following figure:

Difference between recognition, detection, and categorization

An example of object classes with lots of variation: cars and chairs/sofas

Inside such a single object class, we try to cope with different forms of variation, as seen in the following figure:

Difference between recognition, detection, and categorization

Variation within a single object class: illumination changes, object pose, clutter, occlusions, intra-class appearance, and viewpoint

It is very important to make sure that your application actually is of the third and latter case if you plan to use the Viola and Jones object detection framework. In that case, the object instances you want to detect are not known beforehand and they have a large intra-class variance. Each object instance can have differences in shape, color, size, orientation, and so on. The Viola and Jones algorithm will model all that variance into a single object model that will be able to detect any given instance of the class, even if the object instance has never been seen before. And this is the large power of object categorization techniques, where they generalize well over a set of given object samples to learn specifics for the complete object class.

These techniques allow us to train object detectors for more complex classes and thus make object categorization techniques ideal to use in industrial applications such as object inspection, object picking, and so on, where typically used threshold-based segmentation techniques seem to fail due this large variation in the setup.

If your application does not handle objects in these difficult situations, then consider using other techniques such as object recognition if it suits your needs!

Before we start with the real work, let me take the time to introduce to you the basic steps that are common in object detection applications. It is important to pay equal attention to all the steps and definitely not to try and skip some of them for gaining time. These would all influence the end result of the object detector interface:

  1. Data collection: This step includes collecting the necessary data for building and testing your object detector. The data can be acquired from a range of sources going from video sequences to images captured by a webcam. This step will also make sure that the data is formatted correctly to be ready to be passed to the training stage.
  2. The actual model training: In this step, you will use the data gathered in the first step to train an object model that will be able to detect that model class. Here, we will investigate the different training parameters and focus on defining the correct settings for your application.
  3. Object detection: Once you have a trained object model, you can use it to try and detect object instances in the given test images.
  4. Validation: Finally, it is important to validate the detection result of the third step, by comparing each detection with a manually defined ground truth of the test data. Various options for efficiency and accuracy validation will be discussed.

Let's continue by explaining the first step, the data collection in more detail, which is also the first subtopic of this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset