Introducing RNNs

The Sun rises in the ____.

If we were asked to predict the blank term in the preceding sentence, we would probably say east. Why would we predict that the word east would be the right word here? Because we read the whole sentence, understood the context, and predicted that the word east would be an appropriate word to complete the sentence.

If we use a feedforward neural network to predict the blank, it would not predict the right word. This is due to the fact that in feedforward networks, each input is independent of other input and they make predictions based only on the current input, and they don't remember previous input.

Thus, the input to the network will just be the word preceding the blank, which is the word the. With this word alone as an input, our network cannot predict the correct word, because it doesn't know the context of the sentence, which means that it doesn't know the previous set of words to understand the context of the sentence and to predict an appropriate next word.

Here is where we use RNNs. They predict output not only based on the current input, but also on the previous hidden state. Why do they have to predict the output based on the current input and the previous hidden state? Why can't they just use the current input and the previous input?

This is because the previous input will only store information about the previous word, while the previous hidden state will capture the contextual information about all the words in the sentence that the network has seen so far. Basically, the previous hidden state acts like a memory and it captures the context of the sentence. With this context and the current input, we can predict the relevant word.

For instance, let's take the same sentence, The sun rises in the ____. As shown in the following figure, we first pass the word the as an input, and then we pass the next word, sun, as input; but along with this, we also pass the previous hidden state, . So, every time we pass the input word, we also pass a previous hidden state as an input.

In the final step, we pass the word the, and also the previous hidden state , which captures the contextual information about the sequence of words that the network has seen so far. Thus, acts as the memory and stores information about all the previous words that the network has seen. With and the current input word (the), we can predict the relevant next word:

In a nutshell, an RNN uses the previous hidden state as memory which captures and stores the contextual information (input) that the network has seen so far.

RNNs are widely applied for use cases that involve sequential data, such as time series, text, audio, speech, video, weather, and much more. They have been greatly used in various natural language processing (NLP) tasks, such as language translation, sentiment analysis, text generation, and so on.

Table of Contents for Introducing RNNs

Create new playlist

Sign In

Sign Up

Table of Contents for
Introducing RNNs