Forward propagation in RNNs

Let's look at how an RNN uses forward propagation to predict the output; but before we jump right in, let's get familiar with the notations:

The preceding figure illustrates the following:

represents the input to hidden layer weight matrix
represents the hidden to hidden layer weight matrix
represents the hidden to output layer weight matrix

The hidden state at a time step can be computed as follows:

That is, hidden state at a time step, t = tanh([input to hidden layer weight x input] + [hidden to hidden layer weight x previous hidden state]).

The output at a time step can be computed as follows:

That is, output at a time step, t = softmax (hidden to output layer weight x hidden state at a time t).

We can also represent RNNs as shown in the following figure. As you can see, the hidden layer is represented by an RNN block, which implies that our network is an RNN, and previous hidden states are used in predicting the output:

The following diagram shows how forward propagation works in an unrolled version of an RNN:

We initialize the initial hidden state with random values. As you can see in the preceding figure, the output, , is predicted based on the current input, and the previous hidden state, which is an initial hidden state, , using the following formula:

Similarly, look at how the output, , is computed. It takes the current input, , and the previous hidden state, :

Thus, in forward propagation to predict the output, RNN uses the current input and the previous hidden state.

To achieve clarity, let's look at how to implement forward propagation in RNN to predict the output:

Initialize all the weights, , , and , by randomly drawing from the uniform distribution:

U = np.random.uniform(-np.sqrt(1.0 / input_dim), np.sqrt(1.0 / input_dim), (hidden_dim, input_dim))

W = np.random.uniform(-np.sqrt(1.0 / hidden_dim), np.sqrt(1.0 / hidden_dim), (hidden_dim, hidden_dim))

V = np.random.uniform(-np.sqrt(1.0 / hidden_dim), np.sqrt(1.0 / hidden_dim), (input_dim, hidden_dim))

Define the number of time steps, which will be the length of our input sequence, :

num_time_steps = len(x)

Define the hidden state:

hidden_state = np.zeros((num_time_steps + 1, hidden_dim))

Initialize the initial hidden state, , with zeros:

hidden_state[-1] = np.zeros(hidden_dim)

Initialize the output:

YHat = np.zeros((num_time_steps, output_dim))

For every time step, we perform the following:

for t in np.arange(num_time_steps):

    #h_t = tanh(UX + Wh_{t-1})
    hidden_state[t] = np.tanh(U[:, x[t]] + W.dot(hidden_state[t - 1]))

    # yhat_t = softmax(vh)
    YHat[t] = softmax(V.dot(hidden_state[t]))

Table of Contents for Forward propagation in RNNs

Create new playlist

Sign In

Sign Up

Table of Contents for
Forward propagation in RNNs