Differences between LSTM and GRU

There are a few subtle differences between a LSTM and a GRU, although to be perfectly honest, there are more similarities than differences! For starters, a GRU has one less gate than an LSTM. As you can see in the following diagram, an LSTM has an input gate, a forget gate, and an output gate. A GRU, on the other hand, has only two gates, a reset gate and an update gate. The reset gate determines how to combine new inputs with the previous memory, and the update gate defines how much of the previous memory remains:

LSTM vs GRU

Another interesting fact is that if we set the reset gate to all 1s and the update gate to all 0s, do you know what we have? If you guessed a plain old recurrent neural network, you'd be right!

Here are the key differences between a LSTM and a GRU:

A GRU has two gates, a LSTM has three.
GRUs do not have an internal memory cell that is different from the exposed hidden state. This is because the output gate that the LSTM has does.
The input and forget gates are coupled by an update gate that weighs the old and new content.
The reset gate is applied directly to the previous hidden state.
We do not apply a second non-linearity when computing the GRU output.
There is no output gate, the weighted sum is what becomes the output.

Table of Contents for Differences between LSTM and GRU

Create new playlist

Sign In

Sign Up

Table of Contents for
Differences between LSTM and GRU