Implementing a deep learning network for handwritten digit recognition

The mxnet library offers several functions that enable us to define the layers and activations that comprise the deep learning network. The definition of layers, the usage of activation functions, and the number of neurons to be used in each of the hidden layers is generally termed the network architecture. Deciding on the network architecture is more of an art than a science. Often, several iterations of experiments may be needed to decide on the right architecture for the problem. We call it an art as there are no exact rules for finding the ideal architecture. The number of layers, neurons in these layers, and the type of layers are pretty much decided through trial and error.

In this section, we'll build a simple deep learning network with three hidden layers. Here is the general architecture of our network:

The input layer is defined as the initial layer in the network. The mx.symbol.Variable MXNet function defines the input layer.
A fully-connected layer is defined, also called a dense layer, with 128 neurons as the first hidden layer in the network. This can be done using the mx.symbol.FullyConnected MXNet function.
A ReLU activation function is defined as part of the network. The mx.symbol.Activation function helps us to define the ReLU activation function as part of the network.
Define the second hidden layer; it is another dense layer with 64 neurons. This can be accomplished through the mx.symbol.FullyConnected function, similar to the first hidden layer.
Apply a ReLU activation function on the second hidden layer's output. This can be done through the mx.symbol.Activation function.
The final hidden layer in our network is another fully-connected layer, but with just ten outputs (equal to the number of classes). This can be done through the mx.symbol.FullyConnected function as well.
The output layer needs to be defined and this should be probabilities of prediction for each class; therefore, we apply softmax at the output layer. The mx.symbol.SoftmaxOutput function enables us to configure the softmax in the output.

We are not saying that this is the best network architecture possible for the problem, but this is the network we are going to build to demonstrate the implementation of a deep learning network with MXNet.

Now that we have a blueprint in place, let's delve into coding the network using the following code block:

# setting the working directory
setwd('/home/sunil/Desktop/book/chapter 19/MNIST')
# function to load image files
load_image_file = function(filename) {
  ret = list()
  f = file(filename, 'rb')
  readBin(f, 'integer', n = 1, size = 4, endian = 'big')
  n    = readBin(f, 'integer', n = 1, size = 4, endian = 'big')
  nrow = readBin(f, 'integer', n = 1, size = 4, endian = 'big')
  ncol = readBin(f, 'integer', n = 1, size = 4, endian = 'big')
  x = readBin(f, 'integer', n = n * nrow * ncol, size = 1, signed
= FALSE)
  close(f)
  data.frame(matrix(x, ncol = nrow * ncol, byrow = TRUE))
}
# function to load the label files
load_label_file = function(filename) {
  f = file(filename, 'rb')
  readBin(f, 'integer', n = 1, size = 4, endian = 'big')
  n = readBin(f, 'integer', n = 1, size = 4, endian = 'big')
  y = readBin(f, 'integer', n = n, size = 1, signed = FALSE)
  close(f)
  y }
# loading the image files
train = load_image_file("train-images-idx3-ubyte")
test  = load_image_file("t10k-images-idx3-ubyte")
# loading the labels
train.y = load_label_file("train-labels-idx1-ubyte")
test.y  = load_label_file("t10k-labels-idx1-ubyte")
# lineaerly transforming the grey scale image i.e. between 0 and 255 to # 0 and 1
train.x <- data.matrix(train/255)
test <- data.matrix(test/255)
# verifying the distribution of the digit labels in train dataset
print(table(train.y))
# verifying the distribution of the digit labels in test dataset
print(table(test.y))