Validating the output

Now that the model is fit, let's gain an understanding of how an RNN works by working backward—that is, extract the weights of model, feed forward the input through the weights to match the predicted value using NumPy (the code file is available as Building_a_Recurrent_Neural_Network_from_scratch_in_Python.ipynb in GitHub).

Inspect the weights:

model.weights
[<tf.Variable 'simple_rnn_2/kernel:0' shape=(1, 1) dtype=float32_ref>,
 <tf.Variable 'simple_rnn_2/recurrent_kernel:0' shape=(1, 1) dtype=float32_ref>,
 <tf.Variable 'simple_rnn_2/bias:0' shape=(1,) dtype=float32_ref>,
 <tf.Variable 'dense_2/kernel:0' shape=(1, 5) dtype=float32_ref>,
 <tf.Variable 'dense_2/bias:0' shape=(5,) dtype=float32_ref>]

The preceding gives us an intuition of the order in which weights are presented in the output.

In the preceding example, kernel represents the weights and recurrent represents the connection of the hidden layer from one step to another.

Note that a simpleRNN has weights that connect the input to the hidden layer and also weights that connect the previous time step's hidden layer to the current time step's hidden layer.

The kernel and bias in the dense_2 layer represent the layer that connects the hidden layer value to the final output:

1. Extract weights:

model.get_weights()

The preceding line of code gives us the computed values of each of the weights.

Pass the input through the first time step—the input is as follows:

padded_docs[0]
#array([3, 1], dtype=int32)

In the preceding code, the first time step has a value of 3 and the second time step has a value of 1. We'll initialize the value at first time step as follows:

input_t0 = 3

1. The value at the first time step is multiplied by the weight connecting the input to the hidden layer, then the bias value is added:

input_t0_kernel_bias = input_t0*model.get_weights()[0] + model.get_weights()[2]

1. The hidden layer value at this time step is calculated by passing the preceding output through the tanh activation (as that is the activation we specified when we defined the model):

hidden_layer0_value = np.tanh(input_t0_kernel_bias)

Calculate the hidden-layer value at time step 2; where the input has a value of 1 (note the value of padded_docs[0] is [3, 1]):

input_t1 = 1

1. The output value when the input at the second time step is passed through the weight and bias is as follows:

input_t1_kernel_bias = input_t1*model.get_weights()[0] + model.get_weights()[2]

Note that the weights that multiply the input remain the same, regardless of the time step being considered.

1. The calculation for the hidden-layer at various time steps is performed as follows:

Where Φ is an activation that is performed (In general, tanh activation is used).

The calculation from the input layer to the hidden-layer constitutes two components:

- Matrix multiplication of the input layer value and kernel weights
- Matrix multiplication of the hidden layer of the previous time step and recurrent weights

The final calculation of the hidden-layer value at a given time step would be the summation of the preceding two matrix multiplications. Pass the result through a tanh activation function:

input_t1_recurrent = hidden_layer0_value*model.get_weights()[1]

The total value before passing through the tanh activation is as follows:

total_input_t1 = input_t1_kernel_bias + input_t1_recurrent

The output of the hidden-layer value is calculated by passing the preceding output through the tanh activation, as follows:

output_t1 = np.tanh(total_input_t1)

Pass the hidden layer output from the final time step through the dense layer, which connects the hidden layer to the output layer:

final_output = output_t1*model.get_weights()[3] + model.get_weights()[4]

Note that the fourth and fifth output of the model.get_weights() method correspond to the connection from the hidden layer to the output layer.

Pass the preceding output through the softmax activation (as defined in the model) to obtain the final output:

np.exp(final_output)/np.sum(np.exp(final_output))
# array([[0.3684635, 0.33566403, 0.61344165, 0.378485, 0.40699497]], dtype=float32)

You should notice that the output we obtained through the forward pass of input through the network is the same as what the model.predict function gave as output.

Table of Contents for Validating the output

Create new playlist

Sign In

Sign Up

Table of Contents for
Validating the output