LSTM, a variation of an RNN that is used to help learning long term dependencies in the text. LSTMs were initially introduced by Hochreiter & Schmidhuber (1997) (link: http://www.bioinf.jku.at/publications/older/2604.pdf), and many researchers worked on it and produced interesting results in many domains.
These kind of architectures will be able to handle the problem of long-term dependencies in the text because of its inner architecture.
LSTMs are similar to the vanilla RNN as it has a repeating module over time, but the inner architecture of this repeated module is different from the vanilla RNNs. It includes more layers for forgetting and updating information:
As mentioned previously, the vanilla RNNs have a single NN layer, but the LSTMs have four different layers interacting in a special way. This special kind of interaction is what makes LSTM, work very well for many domains, which we'll see while building our language model example:
For more details about the mathematical details and how the four layers are actually interacting with each other, you can have a look at this interesting tutorial: http://colah.github.io/posts/2015-08-Understanding-LSTMs/