How to do it...

Let's preprocess the raccoon dataset and build an object localization model:

Our dataset contains images of different sizes. Let's fix the image's width and height and initialize a few other parameters:

image_channels = 3
batch_size = 15
image_width_resized = 96
image_height_resized = 96
model_name = "raccoon_1_"

Now, we rescale the coordinates of the bounding box according to the new image dimensions:

labels$x_min_resized = (labels[,'xmin']/(labels[,'width']) * image_width_resized)%>% round()
labels$y_min_resized = (labels[,'ymin']/(labels[,'height']) * image_height_resized)%>% round()
labels$x_max_resized = (labels[,'xmax']/(labels[,'width']) * image_width_resized)%>% round()
labels$y_max_resized = (labels[,'ymax']/(labels[,'height']) * image_height_resized)%>% round()

Let's display the resized version of the same example we plotted earlier in the Getting ready section of this recipe:

x <-labels[labels$filename == 'raccoon-1.jpg',]
im_resized <- resize(im = im,size_x = image_width_resized,size_y = image_height_resized)
plot(im_resized)
rect(xleft = x$x_min_resized,ybottom = x$y_min_resized,xright = x$x_max_resized ,ytop = x$y_max_resized,border = "red",lwd = 1)

The following screenshot shows the resized version of the same sample image that we plotted in the Getting ready section of this recipe:

Now, we split the data into training, validation, and test datasets:

X_train <- labels[1:150,]
X_val <- labels[151:200,]
X_test <- labels[201:nrow(labels),]

Let's define a function for calculating the custom metric for our model – intersection over union (IoU). This is the ratio of the intersection over the union of the two bounding boxes, one is the actual bounding box, while the other is the predicted one:

metric_iou <- function(y_true, y_pred) {
 
 intersection_x_min_resized <- k_maximum(y_true[ ,1], y_pred[ ,1])
 intersection_y_min_resized <- k_maximum(y_true[ ,2], y_pred[,2])
 intersection_x_max_resized <- k_minimum(y_true[ ,3], y_pred[ ,3])
 intersection_y_max_resized <- k_minimum(y_true[ ,4], y_pred[ ,4])
 
 area_intersection <- (intersection_x_max_resized - intersection_x_min_resized) * 
 (intersection_y_max_resized - intersection_x_max_resized)
 area_y <- (y_true[ ,3] - y_true[ ,1]) * (y_true[ ,4] - y_true[ ,2])
 area_yhat <- (y_pred[ ,3] - y_pred[ ,1]) * (y_pred[ ,4] - y_pred[ ,2])
 area_union <- area_y + area_yhat - area_intersection
 
 iou <- area_intersection/area_union
 k_mean(iou)
# c(area_y,area_yhat,area_intersection,area_union,iou)
}

Next, we define our model and compile it. Let's instantiate a VGG16 model with the imagenet weights. We will use this as a feature extractor for our model:

feature_extractor <- application_vgg16(include_top = FALSE,
 weights = "imagenet",
 input_shape = c(image_width_resized, image_height_resized,image_channels)
)

Now, we add some layers to VGG16 and build the model:

output <- feature_extractor$output %>%
layer_conv_2d(filters = 4,kernel_size = 3) %>%
layer_reshape(c(4))

model <- keras_model(inputs = feature_extractor$input, outputs = output)

Let's have a look at the summary of the model:

summary(model)

The following screenshot shows the description of the model:

Let's freeze the feature extractor part of the neural network. We will only fine-tune the last layer of our model:

freeze_weights(feature_extractor)

Now, we compile the model by using adam as the optimizer, mse as the loss function, and metric_iou, which we defined in step 3, as the custom metric:

 model %>% compile(
 optimizer = "adam",
 loss = "mae",
 metrics = list(custom_metric("iou", metric_iou))
)

Now, we create a custom generator function to get batches of image data and the corresponding resized bounding box coordinates on the fly during the training process:

localization_generator <- function(data,target_height,target_width,batch_size) {
 
 function(){
 indexes <- sample(1:nrow(data), batch_size, replace = TRUE)
 y <- array(0, dim = c(length(indexes), 4))
 x <- array(0, dim = c(length(indexes), target_height, target_width, 3))
 for (j in 1:length(indexes)){
 im_name = data[indexes[j],"filename"] %>% as.character()
 im = load.image(file =paste0('data/raccoon_dataset/images/',im_name,sep = ""))
 im = resize(im = im,size_x = target_width,size_y = target_height)
 im = im[,,,]
 x[j,,,] <- as.array(im)
 y[j, ] <- data[indexes[j], c("x_min_resized","y_min_resized","x_max_resized","y_max_resized")] %>% as.matrix()
 }
 list(x, y)
 }
 }

Now, we create a generator for the training data by using the localization_generator() function that we created in the preceding code block:

train_generator = localization_generator(data = X_train,target_height = image_height_resized,target_width =image_width_resized,batch_size = batch_size )

We also create a generator for the validation data in a similar way:

validation_generator = localization_generator(data = X_val,target_height = image_height_resized,target_width =image_width_resized,batch_size = batch_size  )

Next, we start training the model. We specify the number of epochs that we want to run the training process for:

epoch = 100

Then, we create some checkpoints so that we can save the state of the training process at certain intervals:

checkpoint_dir <- "checkpoints_raccoon"
dir.create(checkpoint_dir)
filepath <- file.path(checkpoint_dir, paste0(model_name,"weights.{epoch:02d}-{val_loss:.2f}-val_iou{val_iou:.2f}-iou{iou:.2f}.hdf5",sep=""))

We will use callbacks to view the internal states and statistics of the model during training. To make sure that our model doesn't overfit on the training data, we also use EarlyStopping:

cp_callback <- list(callback_model_checkpoint(mode = "auto"
 filepath = filepath,
 save_best_only = TRUE,
 verbose = 1),
 callback_early_stopping(patience = 100))

Now, we fit the training data to the model and start the training process:

model %>% fit_generator(
 train_generator,
 validation_data = validation_generator,
 epochs = epoch,
 steps_per_epoch = nrow(X_train) / batch_size,
 validation_steps = nrow(X_val) / batch_size,
 callbacks = cp_callback
)

Let's save the final model:

model %>% save_model_hdf5(paste0(model_name,"obj_dect_raccoon.h5",sep=""))

Now, we do similar data treatment for a sample test image, as we did for the training data images. Then, we predict the bounding box coordinates.
First, we load a sample test image:

test <- X_test[1,]
test_img <- load.image(paste(file = 'data/raccoon_dataset/images/',test$filename,sep = ""))

Then, we resize the sample test image:

test_img_resized <- resize(test_img,size_x = image_width_resized,size_y = image_width_resized)
test_img_resized_mat = test_img_resized[,,,]

Next, we convert the resized image into an array:

test_img_resized_mat <- as.array(test_img_resized_mat)

Then, we reshape the array to the required dimension:

test_img_resized_mat <- array_reshape(test_img_resized_mat,dim = c(1,image_width_resized,image_width_resized,image_channels))

Here, we predict the coordinates of the bounding box for the test sample:

predicted_cord <- model %>% predict(test_img_resized_mat)
predicted_cord = abs(ceiling(predicted_cord))
predicted_cord

Next, we plot the test image with both the actual bounding box and the predicted bounding box:

plot(test_img_resized)
rect(xleft = x$x_min_resized,ybottom = x$y_min_resized,xright = x$x_max_resized ,ytop = x$y_max_resized,border = "red",lwd = 1)
rect(xleft = predicted_cord[1] ,ybottom = predicted_cord[2] ,xright = predicted_cord[3] + predicted_cord[1] ,ytop = predicted_cord[4] ,border = "green",lwd = 1)

The following screenshot shows the actual (dark gray) and predicted (light gray) bounding boxes:

By training the model for 100 epochs, we achieved an IOU of 0.10.

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...