Training

We will now train our network on the French sentence and the corresponding English translation. Before that, we will look into the function that outputs the training batches:

def get_batches(en_text, fr_text, batch_size):
    for batch_idx in range(0, len(fr_text)//batch_size):
        start_idx = batch_idx * batch_size
        en_batch = en_text[start_idx:start_idx + batch_size]
        fr_batch = fr_text[start_idx:start_idx + batch_size]
        pad_en_batch = np.array(pad_sentences(en_batch, en_word2int))
        pad_fr_batch = np.array(pad_sentences(fr_batch,fr_word2int))
        pad_en_lens = []
        for en_b in pad_en_batch:
            pad_en_lens.append(len(en_b)) 
        pad_fr_lens = []
        for fr_b in pad_fr_batch:
            pad_fr_lens.append(len(fr_b)) 
        yield pad_en_batch, pad_fr_batch, pad_en_lens, pad_fr_lens

The get_batches function returns the French and English sentence batches of the size, batch_size. It also pads the sentences with the padding token. This makes all sentences of an equal length to the maximum length in the batch. We will now look at the training loop:

min_learning_rate = 0.0006
display_step = 20 
stop_early_count = 0 
stop_early_max_count = 3 
per_epoch = 3 
update_loss = 0 
batch_loss = 0
summary_update_loss = [] 
en_train = en_filtered[0:30000]
fr_train = fr_filtered[0:30000]

update_check = (len(fr_train)//batch_size//per_epoch)-1
checkpoint = logs_path + 'best_so_far_model.ckpt' 
with tf.Session(graph=train_graph) as sess:
    tf_summary_writer = tf.summary.FileWriter(logs_path, graph=train_graph)
    merged_summary_op = tf.summary.merge_all()
    sess.run(tf.global_variables_initializer())
    for epoch_i in range(1, epochs+1):
        update_loss = 0
        batch_loss = 0
        for batch_i, (en_batch, fr_batch, en_text_len, fr_text_len) in enumerate(
                 get_batches(en_train, fr_train, batch_size)):
            before = time.time()
            _,loss,summary = sess.run([train_op, tr_cost,merged_summary_op],
                            {input_data: fr_batch,
                            targets: en_batch,learning_rate: lr,
                            en_len: en_text_len,fr_len: fr_text_len,dropout_probs: dr_prob})
            batch_loss += loss
            update_loss += loss
            after = time.time()
            batch_time = after - before
            tf_summary_writer.add_summary(summary, epoch_i * batch_size + batch_i)
            if batch_i % display_step == 0 and batch_i &gt; 0:
                print('** Epoch {:&gt;3}/{} Batch {:&gt;4}/{} - 
                    Batch Loss: {:&gt;6.3f}, seconds: {:&gt;4.2f}'.format(epoch_i,epochs, batch_i, 
                    len(fr_filtered) // batch_size, batch_loss / display_step,
                    batch_time*display_step))
                batch_loss = 0
            if batch_i % update_check == 0 and batch_i &gt; 0:
                print("Average loss:", round(update_loss/update_check,3))
                summary_update_loss.append(update_loss) 
                if update_loss &lt;= min(summary_update_loss):
                    print('Saving model') 
                    stop_early_count = 0
                    saver = tf.train.Saver() 
                    saver.save(sess, checkpoint)
                else:
                    print("No Improvement.")
                    stop_early_count += 1
                    if stop_early_count == stop_early_max_count:
                        break
                update_loss = 0
        if stop_early_count == stop_early_max_count:
            print("Stopping Training.")
            break

Output :

** Epoch   5/20 Batch  440/3131 - Batch Loss:  1.038, seconds: 170.97
** Epoch   5/20 Batch  460/3131 - Batch Loss:  1.154, seconds: 147.05
Average loss: 1.139
Saving model

The main part of the code is the training loop, where we fetch the batches and feed them to the network, keep track of the loss, and save the model if there is an improvement in the loss. If there is no improvement in the loss for stop_early_max_count, the training terminates. We find that the average loss reduces to around 1.139 from 6.49.

Note that this value may change for each run. Refer to the notebook for the complete output.

Table of Contents for Training

Create new playlist

Sign In

Sign Up

Table of Contents for
Training