Chapter 5 - Improvements to the RNN

A Long Short-Term Memory (LSTM) cell is a variant of an RNN that resolves the vanishing gradient problem by using a special structure called gates. Gates keep the information in the memory as long as it is required. They learn what information to keep and what information to discard from the memory.
LSTM consists of three types of gates, namely, the forget gate, the input gate, and the output gate. The forget gate is responsible for deciding what information should be removed from the cell state (memory). The input gate is responsible for deciding what information should be stored in the cell state. The output gate is responsible for deciding what information should be taken from the cell state to give as an output.
The cell state is also called internal memory where all the information will be stored.
While backpropagating the LSTM network, we need to update too many parameters on every iteration. This increases our training time. So, we introduce the Gated Recurrent Units (GRU) cell, which acts as a simplified version of the LSTM cell. Unlike LSTM, the GRU cell has only two gates and one hidden state.
In a bidirectional RNN, we have two different layers of hidden units. Both of these layers connect from the input to the output layer. In one layer, the hidden states are shared from left to right and in another layer, it is shared from right to left.
A Deep RNN computes the hidden state by taking the previous hidden state and also the previous layer's output as input.

The encoder learns the representation (embeddings) of the given input sentence. Once the encoder learns the embedding, it sends the embedding to the decoder. The decoder takes this embedding (a thought vector) as input and tries to construct a target sentence.
When the input sentence is long, the context vector does not capture the whole meaning of the sentence, since it is just the hidden state from the final time step. So, instead of taking the last hidden state as a context vector and using it for the decoder with an attention mechanism, we take the sum of all the hidden states from the encoder and use it as a context vector.

Table of Contents for Chapter 5 - Improvements to the RNN

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 5 - Improvements to the RNN