How to do it...

To gain a practical intuition of how this theory works, let's look at the same example we worked out in understanding RNN but this time using LSTM.

Note that the data preprocessing steps are common between the two examples. Hence, we will reuse the preprocessing part (step 1 to step 4 in the Building an RNN from scratch in Python recipe) and directly head over to the model-building part (the code file is available as LSTM_working_details.ipynb in GitHub):

Define the model:

embed_length=1
max_length=2
model = Sequential()
model.add(LSTM(1,activation='tanh',return_sequences=False,
recurrent_initializer='Zeros',recurrent_activation='sigmoid',
input_shape=(max_length,embed_length),unroll=True))

Note that, in the preceding code, we initialized the recurrent initializer and recurrent activation to certain values only to make this example simpler; the purpose is only to help you understand what is happening in the backend.

model.add(Dense(5, activation='softmax'))

# compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
# summarize the model
print(model.summary())

A summary of the model is as follows:

The number of parameters is 12 in the LSTM layer as there are four gates (forget, input, cell, and output), which results in four weights and four biases connecting the input to the hidden layer. Additionally, the recurrent layer contains weight values that correspond to the four gates, which gives us a total of 12 parameters.

The dense layer has a total of 10 parameters as there are five possible classes as output, and thus five weights and five biases that correspond to each connection from the hidden layer to the output layer.

Let's fit the model:

model.fit(padded_docs.reshape(2,2,1),np.array(one_hot_encoded_labels),epochs=500)

The order of weights of this model are as follows:

model.weights[<tf.Variable 'lstm_19/kernel:0' shape=(1, 4) dtype=float32_ref>,
 <tf.Variable 'lstm_19/recurrent_kernel:0' shape=(1, 4) dtype=float32_ref>,
 <tf.Variable 'lstm_19/bias:0' shape=(4,) dtype=float32_ref>,
 <tf.Variable 'dense_18/kernel:0' shape=(1, 5) dtype=float32_ref>,
 <tf.Variable 'dense_18/bias:0' shape=(5,) dtype=float32_ref>]

The weights can be obtained as follows:

model.get_weights()

From the preceding code (model.weights), we can see that the order of weights in the LSTM layer is as follows:

- Weights of the input (kernel)
- Weights corresponding to the hidden layer (recurrent_kernel)
- Bias in the LSTM layer

Similarly, in the dense layer (the layer that connects the hidden layer to the output), the order of weights is as follows:

- Weight to be multiplied with the hidden layer
- Bias

Here is the order in which the weights and biases appear (not provided in the preceding output, but available in the GitHub repository of Keras) in the LSTM layer:

- Input gate
- Forget gate
- Modulation gate (cell gate)
- Output gate

Calculate the predictions for the input.

We are using raw-encoded input values (1,2,3) without converting them into embedding values—only to see how the calculation works. In practice, we would be converting the input into embedding values.

Reshape the input for the predict method, so that it is as per the data format expected by LSTM (batch size, number of time steps, features per time step):

model.predict(padded_docs[0].reshape(1,2,1))
# array([[0.05610514, 0.11013522, 0.38451442, 0.0529648, 0.39628044]], dtype=float32)

The output of predict method is provided in commented line in the code above.

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...