Forward propagation in RNNs

Let's look at how an RNN uses forward propagation to predict the output; but before we jump right in, let's get familiar with the notations:

The preceding figure illustrates the following:

  • represents the input to hidden layer weight matrix
  • represents the hidden to hidden layer weight matrix
  • represents the hidden to output layer weight matrix

The hidden state at a time step can be computed as follows:

That is, hidden state at a time step, t = tanh([input to hidden layer weight x input] + [hidden to hidden layer weight x previous hidden state]).

The output at a time step can be computed as follows:

That is, output at a time step, t = softmax (hidden to output layer weight x hidden state at a time t).

We can also represent RNNs as shown in the following figure. As you can see, the hidden layer is represented by an RNN block, which implies that our network is an RNN, and previous hidden states are used in predicting the output:

The following diagram shows how forward propagation works in an unrolled version of an RNN:

We initialize the initial hidden state with random values. As you can see in the preceding figure, the output, , is predicted based on the current input, and the previous hidden state, which is an initial hidden state, , using the following formula:

Similarly, look at how the output, , is computed. It takes the current input, , and the previous hidden state, :

Thus, in forward propagation to predict the output, RNN uses the current input and the previous hidden state.

To achieve clarity, let's look at how to implement forward propagation in RNN to predict the output:

  1. Initialize all the weights, , , and , by randomly drawing from the uniform distribution:
U = np.random.uniform(-np.sqrt(1.0 / input_dim), np.sqrt(1.0 / input_dim), (hidden_dim, input_dim))

W = np.random.uniform(-np.sqrt(1.0 / hidden_dim), np.sqrt(1.0 / hidden_dim), (hidden_dim, hidden_dim))

V = np.random.uniform(-np.sqrt(1.0 / hidden_dim), np.sqrt(1.0 / hidden_dim), (input_dim, hidden_dim))
  1. Define the number of time steps, which will be the length of our input sequence, :
num_time_steps = len(x)
  1. Define the hidden state:
hidden_state = np.zeros((num_time_steps + 1, hidden_dim))
  1. Initialize the initial hidden state, , with zeros:
hidden_state[-1] = np.zeros(hidden_dim)
  1. Initialize the output:
YHat = np.zeros((num_time_steps, output_dim))

  1. For every time step, we perform the following:
for t in np.arange(num_time_steps):

#h_t = tanh(UX + Wh_{t-1})
hidden_state[t] = np.tanh(U[:, x[t]] + W.dot(hidden_state[t - 1]))

# yhat_t = softmax(vh)
YHat[t] = softmax(V.dot(hidden_state[t]))
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset