Training

We will now train our network on the French sentence and the corresponding English translation. Before that, we will look into the function that outputs the training batches:

def get_batches(en_text, fr_text, batch_size):
for batch_idx in range(0, len(fr_text)//batch_size):
start_idx = batch_idx * batch_size
en_batch = en_text[start_idx:start_idx + batch_size]
fr_batch = fr_text[start_idx:start_idx + batch_size]
pad_en_batch = np.array(pad_sentences(en_batch, en_word2int))
pad_fr_batch = np.array(pad_sentences(fr_batch,fr_word2int))
pad_en_lens = []
for en_b in pad_en_batch:
pad_en_lens.append(len(en_b))
pad_fr_lens = []
for fr_b in pad_fr_batch:
pad_fr_lens.append(len(fr_b))
yield pad_en_batch, pad_fr_batch, pad_en_lens, pad_fr_lens

The get_batches function returns the French and English sentence batches of the size, batch_size. It also pads the sentences with the padding token. This makes all sentences of an equal length to the maximum length in the batch. We will now look at the training loop:

min_learning_rate = 0.0006
display_step = 20
stop_early_count = 0
stop_early_max_count = 3
per_epoch = 3
update_loss = 0
batch_loss = 0
summary_update_loss = []
en_train = en_filtered[0:30000]
fr_train = fr_filtered[0:30000]

update_check = (len(fr_train)//batch_size//per_epoch)-1
checkpoint = logs_path + 'best_so_far_model.ckpt'
with tf.Session(graph=train_graph) as sess:
tf_summary_writer = tf.summary.FileWriter(logs_path, graph=train_graph)
merged_summary_op = tf.summary.merge_all()
sess.run(tf.global_variables_initializer())
for epoch_i in range(1, epochs+1):
update_loss = 0
batch_loss = 0
for batch_i, (en_batch, fr_batch, en_text_len, fr_text_len) in enumerate(
get_batches(en_train, fr_train, batch_size)):
before = time.time()
_,loss,summary = sess.run([train_op, tr_cost,merged_summary_op],
{input_data: fr_batch,
targets: en_batch,learning_rate: lr,
en_len: en_text_len,fr_len: fr_text_len,dropout_probs: dr_prob})
batch_loss += loss
update_loss += loss
after = time.time()
batch_time = after - before
tf_summary_writer.add_summary(summary, epoch_i * batch_size + batch_i)
if batch_i % display_step == 0 and batch_i > 0:
print('** Epoch {:>3}/{} Batch {:>4}/{} -
Batch Loss: {:>6.3f}, seconds: {:>4.2f}'.format(epoch_i,epochs, batch_i,
len(fr_filtered) // batch_size, batch_loss / display_step,
batch_time*display_step))
batch_loss = 0
if batch_i % update_check == 0 and batch_i > 0:
print("Average loss:", round(update_loss/update_check,3))
summary_update_loss.append(update_loss)
if update_loss <= min(summary_update_loss):
print('Saving model')
stop_early_count = 0
saver = tf.train.Saver()
saver.save(sess, checkpoint)
else:
print("No Improvement.")
stop_early_count += 1
if stop_early_count == stop_early_max_count:
break
update_loss = 0
if stop_early_count == stop_early_max_count:
print("Stopping Training.")
break

Output :

** Epoch 5/20 Batch 440/3131 - Batch Loss: 1.038, seconds: 170.97
** Epoch 5/20 Batch 460/3131 - Batch Loss: 1.154, seconds: 147.05
Average loss: 1.139
Saving model

The main part of the code is the training loop, where we fetch the batches and feed them to the network, keep track of the loss, and save the model if there is an improvement in the loss. If there is no improvement in the loss for stop_early_max_count, the training terminates. We find that the average loss reduces to around 1.139 from 6.49.

Note that this value may change for each run. Refer to the notebook for the complete output.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset