The current part of the recipe will provide a step-by-step guide to prepare the dataset for the Inception-BN pretrained model.
- Load the dependent packages:
# Load packages require(imager) source("download_cifar_data.R") The download_cifar_data consists of function to download and read CIFAR10 dataset.
- Read the downloaded CIFAR-10 dataset:
# Read Dataset and labels DATA_PATH<-paste(SOURCE_PATH, "/Chapter 4/data/cifar-10-batches-bin/", sep="") labels <- read.table(paste(DATA_PATH, "batches.meta.txt", sep="")) cifar_train <- read.cifar.data(filenames = c("data_batch_1.bin","data_batch_2.bin","data_batch_3.bin","data_batch_4.bin"))
- Filter the dataset for aeroplane and automobile. This is an optional step and is done to reduce complexity later:
# Filter data for Aeroplane and Automobile with label 1 and 2, respectively Classes = c(1, 2) images.rgb.train <- cifar_train$images.rgb images.lab.train <- cifar_train$images.lab ix<-images.lab.train%in%Classes images.rgb.train<-images.rgb.train[ix] images.lab.train<-images.lab.train[ix] rm(cifar_train)
- Transform to image. This step is required as the CIFAR-10 dataset is a 32 x 32 x 3 image, which is flattened to a 1024 x 3 format:
# Function to transform to image transform.Image <- function(index, images.rgb) { # Convert each color layer into a matrix, # combine into an rgb object, and display as a plot img <- images.rgb[[index]] img.r.mat <- as.cimg(matrix(img$r, ncol=32, byrow = FALSE)) img.g.mat <- as.cimg(matrix(img$g, ncol=32, byrow = FALSE)) img.b.mat <- as.cimg(matrix(img$b, ncol=32, byrow = FALSE)) # Bind the three channels into one image img.col.mat <- imappend(list(img.r.mat,img.g.mat,img.b.mat),"c") return(img.col.mat) }
- The next step involve padding images with zeros:
# Function to pad image image.padding <- function(x) { img_width <- max(dim(x)[1:2]) img_height <- min(dim(x)[1:2]) pad.img <- pad(x, nPix = img_width - img_height, axes = ifelse(dim(x)[1] < dim(x)[2], "x", "y")) return(pad.img) }
- Save the image to a specified folder:
# Save train images MAX_IMAGE<-length(images.rgb.train) # Write Aeroplane images to aero folder sapply(1:MAX_IMAGE, FUN=function(x, images.rgb.train, images.lab.train){ if(images.lab.train[[x]]==1){ img<-transform.Image(x, images.rgb.train) pad_img <- image.padding(img) res_img <- resize(pad_img, size_x = 224, size_y = 224) imager::save.image(res_img, paste("train/aero/aero", x, ".jpeg", sep="")) } }, images.rgb.train=images.rgb.train, images.lab.train=images.lab.train) # Write Automobile images to auto folder sapply(1:MAX_IMAGE, FUN=function(x, images.rgb.train, images.lab.train){ if(images.lab.train[[x]]==2){ img<-transform.Image(x, images.rgb.train) pad_img <- image.padding(img) res_img <- resize(pad_img, size_x = 224, size_y = 224) imager::save.image(res_img, paste("train/auto/auto", x, ".jpeg", sep="")) } }, images.rgb.train=images.rgb.train, images.lab.train=images.lab.train)
The preceding script saves the aeroplane images into the aero folder and the automobile images in the auto folder.
- Convert to the recording format .rec supported by MXNet. This conversion requires im2rec.py MXnet module from Python as conversion is not supported in R. However, it can be called from R once MXNet is installed in Python using the system command. The splitting of the dataset into train and test can be obtained using the following file:
System("python ~/mxnet/tools/im2rec.py --list True --recursive True --train-ratio 0.90 cifar_224/pks.lst cifar_224/trainf/")
The preceding script will generate two list files: pks.lst_train.lst and pks.lst_train.lst. The splitting of train and validation is controlled by the -train-ratio parameter in the preceding script. The number of classes is based on the number of folders in the trainf directory. In this scenario, two classes are picked: automotive and aeroplane.
- Convert the *.rec file for training and validation dataset:
# Creating .rec file from training sample list System("python ~/mxnet/tools/im2rec.py --num-thread=4 --pass-through=1 /home/prakash/deep learning/cifar_224/pks.lst_train.lst /home/prakash/deep learning/cifar_224/trainf/") # Creating .rec file from validation sample list System("python ~/mxnet/tools/im2rec.py --num-thread=4 --pass-through=1 /home/prakash/deep learning/cifar_224/pks.lst_val.lst /home/prakash/deep learning/cifar_224/trainf/")
The preceding script will create the pks.lst_train.rec and pks.lst_val.rec files to be used in the next recipe to train the model using a pretrained model.