We will start with a simple flow chart of the LSTM network architecture, as shown in the following screenshot:
The preceding flow chart for the LSTM network highlights the layers in the architecture and activation functions used. In the LSTM layer, the tanh activation function is used which is the default activation function for the layer. In the dense layer, the sigmoid activation function is used.
Let's have a look at the following code and summary of the model:
# Model architecture
model <- keras_model_sequential() %>%
layer_embedding(input_dim = 500, output_dim = 32) %>%
layer_lstm(units = 32) %>%
layer_dense(units = 1, activation = "sigmoid")
model
__________________________________________________________________________
Layer (type) Output Shape Param #
==========================================================================
embedding (Embedding) (None, None, 32) 16000
__________________________________________________________________________
lstm (LSTM) (None, 32) 8320
__________________________________________________________________________
dense (Dense) (None, 1) 33
==========================================================================
Total params: 24,353
Trainable params: 24,353
Non-trainable params: 0
__________________________________________________________________________
Apart from what we used for the RNN model in the last chapter, we are replacing layer_simple_rnn with layer_lstm for the LSTM network in this example. For the embedding layer, we have a total of 16,000 (500 x 32) parameters. The calculation shown as follows calculates the number of parameters for the LSTM layer:
=4 x [units in LSTM layer x (units in LSTM layer + output dimension) + units in LSTM layer]
= 4 x [32(32+32) + 32]
= 8320
For a similar architecture involving the RNN layer, we will have 2,080 parameters. The four-fold increase in the number of parameters for the LSTM layer also leads to more training time and hence requires relatively higher processing costs. The number of parameters for the dense layer is [(32x1) + 1], which comes to 33. Hence, overall there are 24,353 parameters in this network.