LSTM network architecture

We will start with a simple flow chart of the LSTM network architecture, as shown in the following screenshot:

The preceding flow chart for the LSTM network highlights the layers in the architecture and activation functions used. In the LSTM layer, the tanh activation function is used which is the default activation function for the layer. In the dense layer, the sigmoid activation function is used.

Let's have a look at the following code and summary of the model:

# Model architecture
model <- keras_model_sequential() %>%
layer_embedding(input_dim = 500, output_dim = 32) %>%
layer_lstm(units = 32) %>%
layer_dense(units = 1, activation = "sigmoid")
model
__________________________________________________________________________
Layer (type) Output Shape Param #
==========================================================================
embedding (Embedding) (None, None, 32) 16000
__________________________________________________________________________
lstm (LSTM) (None, 32) 8320
__________________________________________________________________________
dense (Dense) (None, 1) 33
==========================================================================
Total params: 24,353
Trainable params: 24,353
Non-trainable params: 0
__________________________________________________________________________

Apart from what we used for the RNN model in the last chapter, we are replacing layer_simple_rnn with layer_lstm for the LSTM network in this example. For the embedding layer, we have a total of 16,000 (500 x 32) parameters. The calculation shown as follows calculates the number of parameters for the LSTM layer:

=4 x [units in LSTM layer x (units in LSTM layer + output dimension) + units in LSTM layer] 

= 4 x [32(32+32) + 32]

= 8320

For a similar architecture involving the RNN layer, we will have 2,080 parameters. The four-fold increase in the number of parameters for the LSTM layer also leads to more training time and hence requires relatively higher processing costs. The number of parameters for the dense layer is [(32x1) + 1], which comes to 33. Hence, overall there are 24,353 parameters in this network.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset