There's more...

In this recipe, we showcased a general approach to object localization. However, many other techniques can be leveraged for robust object localization and classification in an image with reduced computation time and cost. These techniques can be used to locate and classify multiple objects in one image. Some of these techniques are as follows:

Regions Convolutional Neural Network (RCNN): This technique uses a selective search algorithm to generate around 2,000 regions for each input image and converts these regions into a fixed size. Each region is then fed into a CNN, which acts as a feature extractor. The extracted features are provided to an SVM, which is usually the last layer of the CNN network and is used to classify whether there is an object in a particular region and also determine the category of the object. Having found the object in the region, the next step in an RCNN is to use a linear regression model to predict the coordinates of the bounding box for the object that was detected in that particular region. A significant challenge with RCNN is that it is very slow and computationally heavy since each region is passed to the CNN network separately.
Fast RCNN: Unlike RCNNs, in this technique, we pass the entire image to several convolutional and pooling layers to produce a feature map rather than passing multiple regions that are generated on top of the original image. Then, by using region proposal methods, we generate regions of interest (ROIs). For each region, an ROI pooling layer is used to extract a fixed-length feature vector from the feature map. ROI max-pooling divides the h × w ROI window into an H × W grid of sub-windows, each with an approximate size of h/H × w/W. Then, we apply max-pooling to each sub-window. These feature vectors are then passed to fully connected layers that are used for object classification by predicting the softmax probability for each output class and the coordinates of the bounding boxes.
Faster RCNN: Faster RCNNs takes the least amount of computation time compared to RCNNs and Fast RCNNs. In Faster RCNNs, objects are detected in one pass with a single neural network. Instead of using selection search algorithms, Faster RCNN uses a region proposal network (RPN) to generate region proposals from the feature maps. RPNs rank the region boxes, also known as anchors, and propose the regions that are highly likely to contain objects. The rest of the procedure, that is, detecting the class of the object and predicting the bounding boxes for each object, is the same as it is for Fast RCNNs.

Table of Contents for There's more...

Create new playlist

Sign In

Sign Up

Table of Contents for
There's more...