Implementing computer vision with pretrained models

In Chapter 14, Exploring the Machine Learning Landscape, we touched upon a concept called transfer learning. The idea is to take the knowledge learned in a model and apply it to another related task. Transfer learning is used on almost all computer vision tasks nowadays. It's rare to train models from scratch unless there is a huge labeled dataset available for training.

Generally, in computer vision, CNNs try to detect edges in the earlier layers, shapes in the middle layer, and some task-specific features in the later layers. Irrespective of the image to be detected by the CNNs, the function of the earlier and middle layers remains the same, which makes it possible to exploit the knowledge gained by a pretrained model. With transfer learning, we can reuse the early and middle layers and only retrain the later layers. It helps us to leverage the labeled data of the task it was initially trained on.

Transfer learning offers two main advantages: it saves us training time and ensures that we have a good model even if we have a lot less labelled training data.

Xception, VGG16, VGG19, ResNet50, InceptionV3, InceptionResNetV2, MobileNet, DenseNet, NASNet, MobileNetV2, QuocNet, AlexNet, Inception (GoogLeNet), and BN-Inception-v2 are some widely-used pretrained models. While we won't delve into the details of each of these pretrained models, the idea of this section is to implement a project to detect the contents of images (input) by making use of a pretrained model through MXNet.

In the code presented in this section, we make use of the pretrained Inception-BatchNorm network to predict the class of an image. The pretrained model needs to be downloaded to the working directory prior to running the code. The model can be downloaded from http://data.mxnet.io/mxnet/data/Inception.zip. Let's explore the following code to label a few test images using the inception_bn pretrained model:

# loading the required libraries
library(mxnet)
library(imager)
# loading the inception_bn model to memory
model = mx.model.load("/home/sunil/Desktop/book/chapter 19/Inception/Inception_BN", iteration=39)
# loading the mean image
mean.img = as.array(mx.nd.load("/home/sunil/Desktop/book/chapter 19/Inception/mean_224.nd")[["mean_img"]])
# loading the image that need to be classified
im <- load.image("/home/sunil/Desktop/book/chapter 19/image1.jpeg")
# displaying the image
plot(im)

This will result in the following output:

To process the images and predict the image IDs that have the highest probability of using the pretrained model, we use the following code:

# function to pre-process the image so as to be consumed by predict function that is using inception_bn model
preproc.image <- function(im, mean.image) {
  # crop the image
  shape <- dim(im)
  short.edge <- min(shape[1:2])
  xx <- floor((shape[1] - short.edge) / 2)
  yy <- floor((shape[2] - short.edge) / 2)
  cropped <- crop.borders(im, xx, yy)
  # resize to 224 x 224, needed by input of the model.
  resized <- resize(cropped, 224, 224)
  # convert to array (x, y, channel)
  arr <- as.array(resized) * 255
  dim(arr) <- c(224, 224, 3)
  # subtract the mean
  normed <- arr - mean.img
  # Reshape to format needed by mxnet (width, height, channel,
num)
  dim(normed) <- c(224, 224, 3, 1)
  return(normed)
}
# calling the image pre-processing function on the image to be classified
normed <- preproc.image(im, mean.img)
# predicting the probabilties of labels for the image using the pre-trained model
prob <- predict(model, X=normed)
# sorting and filtering the top three labels with highest
probabilities
max.idx <- order(prob[,1], decreasing = TRUE)[1:3]
# printing the ids with highest probabilities
print(max.idx)

This will result in the following output with the IDs of the highest probabilities:

[1] 471 627 863

Let's print the labels that correspond to the top-three predicted IDs with the highest probabilities using the following code:

# loading the pre-trained labels from inception_bn model 
synsets <- readLines("/home/sunil/Desktop/book/chapter
6/Inception/synset.txt")
# printing the english labels corresponding to the top 3 ids with highest probabilities
print(paste0("Predicted Top-classes: ", synsets[max.idx]))

This will give the following output:

[1] "Predicted Top-classes: n02948072 candle, taper, wax light"        
[2] "Predicted Top-classes: n03666591 lighter, light, igniter, ignitor"
[3] "Predicted Top-classes: n04456115 torch"

From the output, we see that it has correctly labelled the image that is passed as input. We can test a few more images with the following code to confirm that the classification is done correctly:

im2 <- load.image("/home/sunil/Desktop/book/chapter 19/image2.jpeg")
plot(im2)
normed <- preproc.image(im2, mean.img)
prob <- predict(model, X=normed)
max.idx <- order(prob[,1], decreasing = TRUE)[1:3]
print(paste0("Predicted Top-classes: ", synsets[max.idx]))

This will give the following output:

Take a look at the following code:

[1] "Predicted Top-classes: n03529860 home theater, home theatre"   
[2] "Predicted Top-classes: n03290653 entertainment center"         [3] "Predicted Top-classes: n04404412 television, television system"

Likewise, we can try for a third image using the following code:

# getting the labels for third image
im3 <- load.image("/home/sunil/Desktop/book/chapter 19/image3.jpeg")
plot(im3)
normed <- preproc.image(im3, mean.img)
prob <- predict(model, X=normed)
max.idx <- order(prob[,1], decreasing = TRUE)[1:3]
print(paste0("Predicted Top-classes: ", synsets[max.idx]))

This will give the following output:

Take a look at the following output:

[1] "Predicted Top-classes: n04326547 stone wall" 
[2] "Predicted Top-classes: n03891251 park bench" 
[3] "Predicted Top-classes: n04604644 worm fence, snake fence, snake-rail fence, Virginia fence"

Table of Contents for Implementing computer vision with pretrained models

Create new playlist

Sign In

Sign Up

Table of Contents for
Implementing computer vision with pretrained models