RNN and the long-term dependency problem

RNNs are very powerful and popular too. However, often, we only need to look at recent information to perform the present task rather than information that was stored a long time ago. This is frequent in NLP for language modeling. Let's see a common example:

Figure 5: If the gap between the relevant information and the place that its needed is small, RNNs can learn to use past information

Suppose a language model is trying to predict the next word based on the previous words. As a human being, if we try to predict the last word in the sky is blue, without further context, it's most likely the next word that we will predict is blue. In such cases, the gap between the relevant information and the place is small. Thus, RNNs can learn to use past information easily.

But consider a longer sentence: Asif grew up in Bangladesh... He studied in Korea... He speaks fluent Bengali where we need more context. In this sentence, most recent information advises us that the next word will probably be the name of a language. However, if we want to narrow down which language, we need the context of Bangladesh from previous words:

Figure 6: If the gap between the relevant information and the place that its needed is bigger, RNNs can't learn to use past information

Here, the gap is bigger so RNNs become unable to learn the information. This is a serious drawback of RNN. However, along comes LSTM to the rescue.

Table of Contents for RNN and the long-term dependency problem

Create new playlist

Sign In

Sign Up

Table of Contents for
RNN and the long-term dependency problem