LSTM network architecture

We will start with a simple flow chart of the LSTM network architecture, as shown in the following screenshot:

The preceding flow chart for the LSTM network highlights the layers in the architecture and activation functions used. In the LSTM layer, the tanh activation function is used which is the default activation function for the layer. In the dense layer, the sigmoid activation function is used.

Let's have a look at the following code and summary of the model:

# Model architecture
model <- keras_model_sequential() %>%
         layer_embedding(input_dim = 500, output_dim = 32) %>%
         layer_lstm(units = 32) %>%
         layer_dense(units = 1, activation = "sigmoid")
model
__________________________________________________________________________
Layer (type)                      Output Shape                  Param #     
==========================================================================
embedding (Embedding)             (None, None, 32)              16000       
__________________________________________________________________________
lstm (LSTM)                       (None, 32)                    8320        
__________________________________________________________________________
dense (Dense)                     (None, 1)                     33          
==========================================================================
Total params: 24,353
Trainable params: 24,353
Non-trainable params: 0
__________________________________________________________________________

Apart from what we used for the RNN model in the last chapter, we are replacing layer_simple_rnn with layer_lstm for the LSTM network in this example. For the embedding layer, we have a total of 16,000 (500 x 32) parameters. The calculation shown as follows calculates the number of parameters for the LSTM layer:

=4 x [units in LSTM layer x (units in LSTM layer + output dimension) + units in LSTM layer]

= 4 x [32(32+32) + 32]

= 8320

For a similar architecture involving the RNN layer, we will have 2,080 parameters. The four-fold increase in the number of parameters for the LSTM layer also leads to more training time and hence requires relatively higher processing costs. The number of parameters for the dense layer is [(32x1) + 1], which comes to 33. Hence, overall there are 24,353 parameters in this network.

Table of Contents for LSTM network architecture

Create new playlist

Sign In

Sign Up

Table of Contents for
LSTM network architecture