Long short-term memory

Long short-term memory (LSTM) networks are a specialized form of recurrent neural network. They have the ability to retain long-term memory of things they have encountered in the past. In an LSTM, each neuron is replaced by what is known as a memory unit. This memory unit is activated and deactivated at the appropriate time, and is actually what is known as a recurrent self-connection.

If we step back for a second and look at the back-propagation phase of a regular recurrent network, the gradient signal can end up being multiplied many times by the weight matrix of the synapses between the neurons within the hidden layer. What does this mean exactly? Well, it means that the magnitude of those weights can then have a stronger impact on the learning process. This can be both good and bad.

If the weights are small they can lead to what is known as vanishing gradients, a scenario where the signal gets so small that learning slows to an unbearable pace or even, worse, comes to a complete stop. On the other hand, if the weights are large, this can lead to a situation where the signal is so large that learning diverges rather than converges. Both scenarios are undesirable but are handled by an item within the LSTM model known as a memory cell. Let's talk a little about this memory cell now.

A memory cell has four different parts. They are:

  • Input gate, with a constant weight of 1.0
  • Self-recurrent connection neuron
  • Forget gate, allowing cells to remember or forget its previous state
  • Output gate, allowing the memory cell state to have an effect (or no effect) on other neurons

Let’s take a look at this and try and make it all come together:

Memory Cell
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset