RNN output

Next up, we need to create the output layer, which is responsible for reading the output of the individual LSTM cells and passing them through a fully connected layer. This layer has a softmax output for producing a probability distribution over the likely character to be next after the input one.

As you know, we have generated input batches for the network with size N × M characters, where N is the number of sequences in this batch and M is the number of sequence steps. We have also used L hidden units in the hidden layer while creating the model. Based on the batch size and number of hidden units, the output of the network will be a 3D Tensor with size N × M × L, and that's because we call the LSTM cell M times, one for each sequence step. Each call to LSTM cell produces an output of size L. Finally, we need to do this as many as number of sequences N as the we have.

So we pass this N × M × L output to a fully connected layer (which is the same for all outputs with the same weights), but before doing this, we reshape the output to a 2D tensor, which has a shape of (M * N) × L. This reshaping will make things easier for us when operating on the output, because the new shape will be more convenient; the values of each row represents the L outputs of the LSTM cell, and hence it's one row for each sequence and step.

After getting the new shape, we can connect it to the fully connected layer with the softmax by doing matrix multiplication with the weights. The weights created in the LSTM cells and the weight that we are about to create here have the same name by default, and TensorFlow will raise an error in such a case. To avoid this error, we can wrap the weight and bias variables created here in a variable scope using the TensorFlow function tf.variable_scope().

After explaining the shape of the output and how we are going to reshape it, to make things easier, let's go ahead and code this build_model_output function:

def build_model_output(output, input_size, output_size):

    # Reshaping output of the model to become a bunch of rows, where each row correspond for each step in the seq
    sequence_output = tf.concat(output, axis=1)
    reshaped_output = tf.reshape(sequence_output, [-1, input_size])
    
    # Connect the RNN outputs to a softmax layer
    with tf.variable_scope('softmax'):
        softmax_w = tf.Variable(tf.truncated_normal((input_size, output_size), stddev=0.1))
        softmax_b = tf.Variable(tf.zeros(output_size))
    
    # the output is a set of rows of LSTM cell outputs, so the logits will be a set
    # of rows of logit outputs, one for each step and sequence
    logits = tf.matmul(reshaped_output, softmax_w) + softmax_b
    
    # Use softmax to get the probabilities for predicted characters
    model_out = tf.nn.softmax(logits, name='predictions')
    
    return model_out, logits

Table of Contents for RNN output

Create new playlist

Sign In

Sign Up

Table of Contents for
RNN output