Recurrent Neural Network

Within the set of Artificial Neural Networks (ANN), there are several variants based on the number of hidden layers and data flow. One of the variants is RNN, where the connections between neurons can form a cycle. Unlike feed-forward networks, RNNs can use internal memory for their processing. RNNs are a class of ANNs that feature connections between hidden layers that are propagated through time in order to learn sequences. RNN use cases include the following fields:

  • Stock market predictions
  • Image captioning
  • Weather forecast
  • Time-series-based forecasts
  • Language translation
  • Speech recognition
  • Handwriting recognition
  • Audio or video processing
  • Robotics action sequencing

The networks we have studied so far (feed-forward networks) are based on input data that is powered to the network and converted into output. If it is a supervised learning algorithm, the output is a label that can recognize the input. Basically, these algorithms connect raw data to specific categories by recognizing patterns.
Recurrent networks, on the other hand, take as their input not only current input data that is powered to the network but also what they have experienced over time.

The decision made by a recurrent network at a specific instant affects the decision it will reach immediately afterwards. So, recurrent networks have two input sources--the present and the recent past--that combine to determine how they respond to new data, just as people do in life everyday.

Recurrent networks are distinguished from feed-forward networks thanks to the feedback loop linked to their past decisions, thus accepting their output momentarily as inputs. This feature can be emphasized by saying that recurrent networks have memory. Adding memory to neural networks has a purpose: there is information in the sequence itself and recurrent networks use it to perform the tasks that feed-forward networks cannot.

RNN is a class of neural network where there are connections between neurons that form a directed cycle. A typical RNN is represented in the following figure:

Here, the output of one instance is taken as input for the next instance for the same neuron. The way the data is kept in memory and flows at different time periods makes RNNs powerful and successful.

Under RNNs, there are more variants in the way the data flows backwards:

  • Fully recurrent
  • Recursive
  • Hopfield
  • Elman and Jordan networks
  • Neural history compressor
  • LSTM
  • Gated Recurrent Unit (GRU)
  • Bidirectional
  • Recurrent MLP

Recurrent networks are designed to recognize patterns as a sequence of data and are helpful in prediction and forecasting. They can work on text, images, speech, and time series data. RNNs are among the powerful ANNs and represent the biological brain, including memory with processing power. Recurrent networks take inputs from the current input (like a feed-forward network) and the output that was calculated previously:

To understand this better, we consider the RNN as a network of neural networks, and the cyclic nature is unfolded in the following manner. The state of a neuron h is considered at different time periods (-t-1, t, t+1 and so on) until convergence or the total number of epochs is reached.

Vanilla is the first model of recurrent ANNs that was introduced. A vanilla RNN is shown in the following figure:

Other variants such as GRU or LSTM networks are more widespread given the simplicity of implementation, and they have demonstrated remarkable performance in a wide range of applications involving sequences such as language modeling, speech recognition, image captioning, and automatic translation.

RNNs can be implemented in R through the following packages:

  • rnn
  • MxNetR
  • TensorFlow for R

RNNs are mainly used for sequence modeling. The inputs and outputs are treated as vectors (a matrix of numbers). For another level of understanding of RNNs, I advise you to go through the character sequencing example by Andrej Karpathy.

The features of RNN make it like an ANN with memory. The ANN memory is more like the human brain. With memory, we can make machines think from scratch and learn from their "memory." RNNs are basically ANNs with loops that allow information to persist in the network. The looping allows information to be passed from state t to state t+1.

As seen in the preceding diagram, RNNs can be thought of as multiple copies of the same ANN, with the output of one passing on as input to the next one. When we persist the information, as the patterns change, RNN is able to predict the t+1 value. This is particularly useful for analyzing time-series-based problems.

There is no specific labeling required; the value that is part of the input forms the time series variable, and RNN can learn the pattern and do the prediction.

The internal state of the RNN is updated for every time step of the learning process. The feed-forward mechanism in RNN is similar to ANN; however, the backpropagation is an error term correction following something called Backpropagation Through Time (BPTT).

Backpropagation through time follows this pseudocode:

  1. Unfold the RNN to contain n feed-forward networks.
  2. Initialize the weights w to random values.
  3. Perform the following until the stopping criteria is met or you are done with the required number of epochs.
  4. Set inputs to each network with values as xi.
  5. Forward-propagate the inputs over the whole unfolded network.
  6. Back-propagate the error over the unfolded network.
  7. Update all the weights in the network.
  8. Average out the weights to find the final weight in the folded network.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset