LSTM networks

LSTM, a variation of an RNN that is used to help learning long term dependencies in the text. LSTMs were initially introduced by Hochreiter & Schmidhuber (1997) (link: http://www.bioinf.jku.at/publications/older/2604.pdf), and many researchers worked on it and produced interesting results in many domains.

These kind of architectures will be able to handle the problem of long-term dependencies in the text because of its inner architecture.

LSTMs are similar to the vanilla RNN as it has a repeating module over time, but the inner architecture of this repeated module is different from the vanilla RNNs. It includes more layers for forgetting and updating information:

Figure 8: The repeating module in a standard RNN containing a single layer (source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

As mentioned previously, the vanilla RNNs have a single NN layer, but the LSTMs have four different layers interacting in a special way. This special kind of interaction is what makes LSTM, work very well for many domains, which we'll see while building our language model example:

Figure 9: The repeating module in an LSTM containing four interacting layers (source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

For more details about the mathematical details and how the four layers are actually interacting with each other, you can have a look at this interesting tutorial: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset