Why do we use LSTM networks?

We have seen, in the previous chapter, that recurrent neural networks provide decent performance when working with data involving sequences. One of the key advantages of using LSTM networks lies in the fact that they address the vanishing gradient problem that makes network training difficult for a long sequence of words or integers. Gradients are used for updating RNN parameters and for a long sequence of words or integers; these gradients become smaller and smaller to the extent that, effectively, no network training can take place. LSTM networks help to overcome this problem and make it possible to capture long-term dependencies between keywords or integers in sequences that are separated by a large distance. For example, consider the following two sentences, where the first sentence is short and the second sentence is relatively longer:

  • Sentence-1: I like to eat chocolates.
  • Sentence-2: I like, whenever there is a chance and usually there are many of them, to eat chocolates.

In these sentences, the two important words that capture the main essence of the sentence are like and chocolates. In the first sentence, the words like and chocolates are closer to each other and they are separated by just two words in between. On the other hand, in the second sentence, these two words are separated by as many as 14 words that lie between them. LSTM networks are designed to deal with such long-term dependencies that are observed in longer sentences or longer sequences of integers. In this chapter, we focus on applying LSTM networks for developing a movie review sentiment classification model.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset