Training

Next, a function is created to train the data. The placeholders are created for the question pairs and their labels. The output of the model created in the preceding function is taken through cross-entropy softmax as the loss function. Using the Adam optimizer, the model weights are optimized, as follows:

def train(train_x1, train_x2, train_y, val_x1, val_x2, val_y, max_sent_len, char_map, epochs=2, batch_size=1024, num_classes=2):
    with tf.name_scope('Placeholders'):
        x1_pls = tf.placeholder(tf.int32, shape=[None, max_sent_len])
        x2_pls = tf.placeholder(tf.int32, shape=[None, max_sent_len])
        y_pls = tf.placeholder(tf.int64, [None])
        keep_prob = tf.placeholder(tf.float32)  # Dropout

Next, the model is created and followed by logit computation. The loss is computed between the logits and the one-hot encoding of the labels. The loss is optimized using the Adam optimizer, with a learning rate of 0.001. The correct prediction and accuracy is calculated, as follows:

    predict = model(x1_pls, x2_pls, char_map, keep_prob)
    with tf.name_scope('loss'):
        mean_loss = tf.losses.softmax_cross_entropy(logits=predict, onehot_labels=tf.one_hot(y_pls, num_classes))
    with tf.name_scope('optimizer'):
        optimizer = tf.train.AdamOptimizer(learning_rate=0.001)
        train_step = optimizer.minimize(mean_loss)
    with tf.name_scope('accuracy'):
        correct_prediction = tf.equal(tf.argmax(predict, 1), y_pls)
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    saver = tf.train.Saver()

The session is initialized for all of the weights. For every epoch, the encoded data is shuffled and fed through the model. The same procedure is also followed by the validation data:

with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        train_indicies = np.arange(train_x1.shape[0])
        variables = [mean_loss, correct_prediction, train_step]

Next, the epochs are iterated, and for each epoch, the training data is shuffled, to provide robust training:

iter_cnt = 0
for e in range(epochs):
    np.random.shuffle(train_indicies)
         losses = []
         correct = 0

Next, the batches of data are iterated and sliced, shown as follows. The data is then used for training and validating the model:

for i in range(int(math.ceil(train_x1.shape[0] / batch_size))):
                start_idx = (i * batch_size) % train_x1.shape[0]
                idx = train_indicies[start_idx:start_idx + batch_size]

Next, the feed dictionary is assembled and passed through the session, as shown here:

feed_dict = {x1_pls: train_x1[idx, :],
             x2_pls: train_x2[idx, :],
             y_pls: train_y[idx],
             keep_prob: 0.95}
 actual_batch_size = train_y[idx].shape[0]

 loss, corr, _ = sess.run(variables, feed_dict=feed_dict)

Next, using the computed loss and the correct duplicates, the accuracy is calculated:


                corr = np.array(corr).astype(np.float32)
                losses.append(loss * actual_batch_size)
                correct += np.sum(corr)
                if iter_cnt % 10 == 0:
                    print("Minibatch {0}: with training loss = {1:.3g} and accuracy of {2:.2g}" 
                          .format(iter_cnt, loss, np.sum(corr) / actual_batch_size))
                iter_cnt += 1
            total_correct = correct / train_x1.shape[0]
            total_loss = np.sum(losses) / train_x1.shape[0]
            print("Epoch {2}, Overall loss = {0:.5g} and accuracy of {1:.3g}" 
                  .format(total_loss, total_correct, e + 1))

For every five iterations, the validation data is prepared and validation accuracy is calculated, as shown here:


            if (e + 1) % 5 == 0:
                val_losses = []
                val_correct = 0
                for i in range(int(math.ceil(val_x1.shape[0] / batch_size))):
                    start_idx = (i * batch_size) % val_x1.shape[0]

                    feed_dict = {x1_pls: val_x1[start_idx:start_idx + batch_size, :],
                                 x2_pls: val_x2[start_idx:start_idx + batch_size, :],
                                 y_pls: val_y[start_idx:start_idx + batch_size],
                                 keep_prob: 1}
                    print(y_pls)
                    actual_batch_size = val_y[start_idx:start_idx + batch_size].shape[0]
                    loss, corr, _ = sess.run(variables, feed_dict=feed_dict)
                    corr = np.array(corr).astype(np.float32)
                    val_losses.append(loss * actual_batch_size)
                    val_correct += np.sum(corr)

Next, the accuracy of the matches is calculated:


                total_correct = val_correct / val_x1.shape[0]
                total_loss = np.sum(val_losses) / val_x1.shape[0]
                print("Validation Epoch {2}, Overall loss = {0:.5g} and accuracy of {1:.3g}" 
                      .format(total_loss, total_correct, e + 1))
            if (e+1) % 10 == 0:
                save_path = saver.save(sess, './model_{}.ckpt'.format(e))
                print("Model saved in path:{}".format(save_path))

The model is saved, and it can be restored for inference, as shown in the next section.

Table of Contents for Training

Create new playlist

Sign In

Sign Up

Table of Contents for
Training