Recurrent neural networks

All the networks we saw earlier have one layer feeding data to another layer, and there was no loop. Recurrent networks loop on themselves, so what happens is that the new value of an output also depends on the past internal state of a node as well as its input. This can be summed up in the following picture:

Theoretically, these networks can be trained, but it is a hard task, especially in text prediction when a new word may depend on other words that are long gone (think of clouds up in the sky where the predicted word sky depends on cloud that is three words in the past).

More information on this problem can be found by looking up "vanishing gradient in recurrent neural networks" on your favorite search engine.

As such, other architectures were developed that don't have these problems. The main one is called LSTM. The paradoxical name reflects how it works. First, it has two internal states, as can be seen in the following schema:

Image adapted from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

The internal state is a mix of the input we set and nonlinearity of the internal state. There are evolutions of this, but it is good enough for our applications here.

If we compare this to hidden Markov models, filters that are used in finance (AR(n) or more complex), this one is nonlinear. Just as for the convolution layers, the LSTM layers will extract features from the input signal, and then the dense layers will make the final decision (classification, in our examples).

Table of Contents for Recurrent neural networks

Create new playlist

Sign In

Sign Up

Table of Contents for
Recurrent neural networks