Decoder network

The decoder network is also created with Gated Recurrent Unit (GRU) cells. The decoding_layer function takes the output of the encoder and the word embeddings of the English text as input. It produces an output projection vector of a size equal to the vocabulary size of the English text:

def decoding_layer(decoding_embed_inp, embeddings, encoding_op, encoding_st, v_size, fr_len,             en_len,max_en_len, rnn_cell_size, word2int, dropout_prob, batch_size, n_layers):
    for l in range(n_layers):
        with tf.variable_scope('dec_rnn_layer_{}'.format(l)):
            gru = tf.contrib.rnn.GRUCell(rnn_len)
            decoding_cell = tf.contrib.rnn.DropoutWrapper(gru,input_keep_prob = dropout_prob)
    out_l = Dense(v_size, kernel_initializer = tf.truncated_normal_initializer(mean = 0.0,
            stddev=0.1))
    attention = BahdanauAttention(rnn_cell_size, encoding_op,fr_len, normalize=False,                                            name='BahdanauAttention')
    decoding_cell = AttentionWrapper(decoding_cell,attention,rnn_len)
    attention_zero_state = decoding_cell.zero_state(batch_size , tf.float32 )
    attention_zero_state = attention_zero_state.clone(cell_state = encoding_st[0])
    with tf.variable_scope("decoding_layer"):
        logits_tr = training_decoding_layer(decoding_embed_inp, en_len, decoding_cell,                                                   attention_zero_state,out_l,v_size, max_en_len)
    with tf.variable_scope("decoding_layer", reuse=True):
        logits_inf = inference_decoding_layer(embeddings, word2int[TOKEN_GO],word2int[TOKEN_EOS],
                        decoding_cell, attention_zero_state, out_l,max_en_len,batch_size)
    return logits_tr, logits_inf

We also include dropout in the decoder using DropoutWrapper. The Dense layer incorporates the projection vector, and BahdanauAttention together with AttentionWrapper capture the attention between the encoder output and the decoder. Note that we also use different decoding mechanisms for training and inference:

def training_decoding_layer(decoding_embed_input, en_len, decoding_cell, initial_state, op_layer,             v_size, max_en_len):
    helper = TrainingHelper(inputs=decoding_embed_input,sequence_length=en_len,time_major=False)
    dec = BasicDecoder(decoding_cell,helper,initial_state,op_layer)
    logits, _, _ = dynamic_decode(dec,output_time_major=False,impute_finished=True,
    maximum_iterations=max_en_len)
    return logits

During training, we use the normal TrainingHelper from the TensorFlow seq2seq package, whereas we utilize GreedyEmbeddingHelper during inference:

def inference_decoding_layer(embeddings, start_token, end_token, decoding_cell, 
        initial_state, op_layer,max_en_len, batch_size):
    start_tokens = tf.tile(tf.constant([start_token], dtype=tf.int32,
                   [batch_size],name='start_tokens')
    inf_helper = GreedyEmbeddingHelper(embeddings,start_tokens,end_token)
    inf_decoder = BasicDecoder(decoding_cell,inf_helper,initial_state,op_layer) 
    inf_logits, _, _ = dynamic_decode(inf_decoder,output_time_major=False,impute_finished=True,
                        maximum_iterations=max_en_len)
    return inf_logits

GreedyEmbeddingHelper selects the words with the maximum probabilities in the projection vector output by the encoder.

Table of Contents for Decoder network

Create new playlist

Sign In

Sign Up

Table of Contents for
Decoder network