Decoder network

Like the encoder, for the decoder network, we will utilize an RNN with GRU cells. We will also have nlyrs number of GRU layers, created with dropout , using the tf.contrib.rnn.DropoutWrapper wrapper class. We will utilize the BahdanauAttention mechanism to incorporate attention on the output of the encoder:

def decoding_layer(dec_emb_op, embs, enc_op, enc_st, v_size, txt_len, 
                   summ_len,mx_summ_len, rnsize, word2int, dprob, batch_size, nlyrs):
    
    for l in range(nlyrs):
        with tf.variable_scope('dec_rnn_layer_{}'.format(l)):
            gru = tf.contrib.rnn.GRUCell(rnn_len)
            cell_dec = tf.contrib.rnn.DropoutWrapper(gru,input_keep_prob = dprob)
    out_l = Dense(v_size, kernel_initializer = tf.truncated_normal_initializer(mean = 0.0, stddev=0.1))
    
    attention = BahdanauAttention(rnsize, enc_op,txt_len,
                                                  normalize=False,
                                                  name='BahdanauAttention')
    cell_dec = AttentionWrapper(cell_dec,attention,rnn_len)
    attn_zstate = cell_dec.zero_state(batch_size , tf.float32 )
    attn_zstate = attn_zstate.clone(cell_state = enc_st[0])
    with tf.variable_scope("decoding_layer"):
        tr_dec_op = trng_dec_layer(dec_emb_op, 
                                                  summ_len, 
                                                  cell_dec, 
                                                  attn_zstate,
                                                  out_l,
                                                  v_size, 
                                                  mx_summ_len)
    with tf.variable_scope("decoding_layer", reuse=True):
        inf_dec_op = infr_dec_layer(embs, 
                                                    word2int[TOKEN_GO], 
                                                    word2int[TOKEN_EOS],
                                                    cell_dec, 
                                                    attn_zstate, 
                                                    out_l,
                                                    mx_summ_len,
                                                    batch_size)

    return tr_dec_op, inf_dec_op

For the attention and sequence to sequence generation, we will use classes from the seq2seq library in TensorFlow. Now, we can look at how decoding is performed during training:

def trng_dec_layer(dec_emb_inp, summ_len, cell_dec, st_init, lyr_op, 
                            v_size, max_summ_len):
    helper = TrainingHelper(inputs=dec_emb_inp,sequence_length=summ_len, time_major=False)
    dec = BasicDecoder(cell_dec,helper,st_init,lyr_op) 
    logits, _, _ = dynamic_decode(dec,output_time_major=False,impute_finished=True, 
                                  maximum_iterations=max_summ_len)
    return logits

We use the tf.contrib.seq2seq.TrainingHelper class to feed the ground truth summaries to the decoder input for the training. The attention state is also passed as input to the decoder, through the initial_state tensor. Finally, the tf.contrib.seq2seq.dynamic_decode function performs the decoding on this input.

Table of Contents for Decoder network

Create new playlist

Sign In

Sign Up

Table of Contents for
Decoder network