Developing the model architecture

In this section, we will make use of convolutional and LSTM layers in the same network. The convolutional recurrent network architecture can be captured in the form of a simple flowchart:

Here, we can see that the flowchart contains embedding, convolutional 1D, maximum pooling, LSTM, and dense layers. Note that the embedding layer is always the first layer in the network and is commonly used for applications involving text data. The main purpose of the embedding layer is to find a mapping of each unique word, which in our example is 500, and turn it into a vector that is smaller in size, which we will specify using output_dim. In the convolutional layer, we will use the relu activation function. Similarly, the activation functions that will be used for the LSTM and dense layers will be tanh and softmax, respectively.

We can use the following code to develop the model architecture. This also includes the output of the model summary:

# Model architecture
model <- keras_model_sequential() %>%
layer_embedding(input_dim = 500,
output_dim = 32,
input_length = 300) %>%
layer_conv_1d(filters = 32,
kernel_size = 5,
padding = "valid",
activation = "relu",
strides = 1) %>%
layer_max_pooling_1d(pool_size = 4) %>%
layer_lstm(units = 32) %>%
layer_dense(units = 50, activation = "softmax")

# Model summary
summary(model)
___________________________________________________________________________
Layer (type) Output Shape Param #
===========================================================================
embedding (Embedding) (None, 300, 32) 16000
___________________________________________________________________________
conv1d (Conv1D) (None, 296, 32) 5152
___________________________________________________________________________
max_pooling1d (MaxPooling1D) (None, 74, 32) 0
___________________________________________________________________________
lstm (LSTM) (None, 32) 8320
___________________________________________________________________________
dense (Dense) (None, 50) 1650
===========================================================================
Total params: 31,122
Trainable params: 31,122
Non-trainable params: 0
___________________________________________________________________________

From the preceding code, we can make the following observations:

  • We have specified input_dim as 500, which was used as the number of most frequent words during data preparation.
  • For output_dim, we are using 32, which represents the size of the embedding vector. However, note that other numbers can also be explored and we will do so later in this chapter, at the time of performance optimization.
  • For input_length, we have specified 300, which is the number of integers in each sequence.

After the embedding layer, we have added a 1D convolutional layer with 32 filters. In the previous chapters, we used a 2D convolutional layer when working on image classification problems. In this example, we have data involving sequences and, in such situations, a 1D convolutional layer is more appropriate. For this layer, we have specified the following:

  • The length of the 1D convolutional window is specified as 5 using kernel_size
  • We use valid for padding to indicate that no padding is required.
  • We have specified the activation function as relu.
  • The strides of the convolution have been specified at 1.

The convolutional layer is followed by a pooling layer. The following are some of the comments for pooling and the subsequent layer:

  • The convolutional layer helps us extract features, while the pooling layer after the convolutional layer helps us carry out downsampling and detect important features.
  • In this example, we have specified a pooling size of 4, which means that the size of the output (74) is one-fourth of the input (296). This can also be seen in the model summary.
  • The next layer is the LSTM with 32 units.
  • The last layer is a dense layer with 50 units for the 50 authors, along with the softmax activation function.
  • The softmax activation function makes all 50 outputs have a total value of one and thus allows them to be used as probabilities for each of the 50 authors.
  • As we can see from the summary of the model, the total number of parameters in this network is 31,122.

Next, we will compile the model, followed by training it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset