Developing a deep learning model architecture

Developing the architecture of a model involves defining various items, such as the type and number of layers for the network, the type of activation function, the number of units or neurons to use in the network, and also providing the data-related input/output values. An example of specifying a simple sequential model architecture using Keras in R is shown in the following code:

# Model architecture
model <- keras_model_sequential()
model %>%
layer_dense(units = 8, activation = 'relu', input_shape = c(21)) %>%
layer_dense(units = 3, activation = 'softmax')

Note that a sequential model allows us to develop models layer by layer. As seen from the preceding code, two layers of densely connected networks have been added as a part of the sequential model. Two important decisions while choosing a model architecture involve the number and type of layers and the type of activation function for a layer. The number and type of layers to use is guided by the nature and complexity of the data. For a fully connected network (also known as a multilayer perceptron), we can use a dense layer with the help of the layer_dense function available in Keras.

On the other hand, when working with image data, we are likely to use convolutional layers in the network, using the layer_conv_2d function. We will discuss more details about specific model architectures with examples in each chapter. 

There are different types of activation functions that are used in deep learning networks. A rectified linear unit, or relu, is a popular activation function used in hidden layers, and it uses a very simple calculation. If the input is negative, it returns a value of zero and, for everything else, there is no change to the original value. As an example, let's look at the following code:

# RELU function and related plot
x <- rnorm(10000, 2, 10)
y <- ifelse(x<0, 0, x)
par(mfrow = c(1,2))
hist(x)
plot(x,y)

The preceding code generates 10,000 random numbers from a normal distribution with a mean of two and a standard deviation of 10, and stores the results in x. And then negative values are changed to zero and stored in y. A histogram of x and a scatter plot for x and y are given in the following graphs:

It can be observed from the preceding histogram that x has values that are both positive and negative. The scatter plot, based on the original x values and the modified y value that is obtained after converting negative values to zero, visualizes the impact of the relu activation function. In the scatter plot, the data points to the left of x = 0 are flat and have a zero slope. The data points to the right of x = 0 have a perfect linear pattern with a slope of 1.

One of the main advantages of using the relu activation function is its simple calculation. For developing deep learning network models, this becomes an important factor as it helps to reduce computational cost. For many deep learning networks, a rectified linear unit is used as the default activation function.

Another popular activation function used for developing deep networks is softmax, which is usually used in the outer layer of the network. Let's look at the following code to understand it better:

# Softmax function and related plot
x <- runif(1000, 1, 5)
y <- exp(x)/sum(exp(x))
par(mfrow=c(1,2))
hist(x)
plot(x,y)

In the preceding code, we have taken a random sample of 1,000 values from a uniform distribution that lies between 1 and 5. To use the softmax function, we can divide the exponential of each input value x by the sum of the exponential values of x. The resulting histogram, based on the x values, and the scatter plot of x and y values are shown in the following graphs:

We can observe that the preceding histogram provides an approximate uniform pattern for the x values. The impact of the softmax function can be seen from the scatter plot where the output values now lie between 0 and 1. This conversion is very useful for interpreting the results in terms of probabilities as the values now are as follows:

  • Lie between 0 and 1
  • The total of these probabilities is 1

This aspect of the softmax activation function, where results can be interpreted in terms of probabilities, makes it a popular choice when developing deep learning classification models. It works well whether we use it for image classification or text classification problems.

Apart from these two activation functions, we also make use of others that may be more suitable for a specific deep leaning model.

Once a model architecture to be used is specified, the next step is to compile the model.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset