Recurrent neural networks architectures

Depending on our background of using previous deep learning architectures, you will find out why RNNs are special. The previous architectures that we have learned about are not flexible in terms of their input or training. They accept a fixed-size sequence/vector/image as an input and produce another fixed-size one as an output. RNN architectures are somehow different, because they enable you to feed a sequence as input and get another sequence as output, or to have sequences in the input only/output only as shown in Figure 1. This kind of flexibility is very useful for multiple applications such as language modeling and sentiment analysis:

Figure 1: Flexibility of RNNs in terms of shape of input or output (http://karpathy.github.io/2015/05/21/rnn-effectiveness/)

The intuition behind these set of architectures is to mimic the way humans process information. In any typical conversation your understanding of someone's words is totally dependent on what he said previously and you might even be able to predict what he's going to say next based on what he just said. 

The exact same process should be followed in the case of RNNs. For example, imagine you want translate a specific word in a sentence. You can't use traditional FNNs for that, because they won't be able to use the translation of previous words as an input with the current word that we want to translate, and this may result in an incorrect translation because of the lack of contextual information around this word.

RNNs do preserves information about the past and they have some kind of loops to allow the previously learned information to be used for the current prediction at any given point:

Figure 2: RNNs architecture which has loop to persist information for past steps (source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

In Figure 2, we have some neural networks called A which receives an input Xt and produces and output ht. Also, it receives information from past steps with the help of this loop.

This loop seems to unclear, but if we used the unrolled version of Figure 2, you will find out that it's very simple and intuitive, and that the RNN is nothing but a repeated version of the same network (which could be normal FNN), as shown in Figure 3:

Figure 3: An unrolled version of the recurrent neural network architecture (source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

This intuitive architecture of RNNs and its flexibility in terms of input/output shape make them a good fit for interesting sequence-based learning tasks such as machine translation, language modeling, sentiment analysis, image captioning, and more.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset