Chapter 9. Object Recognition

In this chapter, we are going to learn about object recognition and how we can use it to build a visual search engine. We will discuss feature detection, building feature vectors, and using machine learning to build a classifier. We will learn how to use these different blocks to build an object recognition system.

By the end of this chapter, you will know:

  • What is the difference between object detection and object recognition
  • What is a dense feature detector
  • What is a visual dictionary
  • How to build a feature vector
  • What is supervised and unsupervised learning
  • What are Support Vector Machines and how to use them to build a classifier
  • How to recognize an object in an unknown image

Object detection versus object recognition

Before we proceed, we need to understand what we are going to discuss in this chapter. You must have frequently heard the terms "object detection" and "object recognition", and they are often mistaken to be the same thing. There is a very distinct difference between the two.

Object detection refers to detecting the presence of a particular object in a given scene. We don't know what the object might be. For instance, we discussed face detection in Chapter 3, Detecting and Tracking Different Body Parts. During the discussion, we only detected whether or not a face is present in the given image. We didn't recognize the person! The reason we didn't recognize the person is because we didn't care about that in our discussion. Our goal was to find the location of the face in the given image. Commercial face recognition systems employ both face detection and face recognition to identify a person. First, we need to locate the face, and then, run the face recognizer on the cropped face.

Object recognition is the process of identifying an object in a given image. For instance, an object recognition system can tell you if a given image contains a dress or a pair of shoes. In fact, we can train an object recognition system to identify many different objects. The problem is that object recognition is a really difficult problem to solve. It has eluded computer vision researchers for decades now, and has become the holy grail of computer vision. Humans can identify a wide variety of objects very easily. We do it everyday and we do it effortlessly, but computers are unable to do it with that kind of accuracy.

Let's consider the following image of a latte cup:

Object detection versus object recognition

An object detector will give you the following information:

Object detection versus object recognition

Now, consider the following image of a teacup:

Object detection versus object recognition

If you run it through an object detector, you will see the following result:

Object detection versus object recognition

As you can see, the object detector detects the presence of the teacup, but nothing more than that. If you train an object recognizer, it will give you the following information, as shown in the image below:

Object detection versus object recognition

If you consider the second image, it will give you the following information:

Object detection versus object recognition

As you can see, a perfect object recognizer would give you all the information associated with that object. An object recognizer functions more accurately if it knows where the object is located. If you have a big image and the cup is a small part of it, then the object recognizer might not be able to recognize it. Hence, the first step is to detect the object and get the bounding box. Once we have that, we can run an object recognizer to extract more information.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset