Model training

Now, we'll define a loop that iterates num_iterations times. And for each loop, it runs training, feeding in values from input_values_train and target_values_train using feed_dict.

In order to calculate accuracy, it will test the model against the unseen data in input_values_test :

for i in range(num_iterations+1):
    sess.run(train, feed_dict={input_values: input_values_train, output_values: target_values_train})
    if i%100 == 0:
        print('Training Step:' + str(i) + ' Accuracy = ' + str(sess.run(model_accuracy, feed_dict={input_values: input_values_test, output_values: target_values_test})) + ' Loss = ' + str(sess.run(model_cross_entropy, {input_values: input_values_train, output_values: target_values_train})))

Output:
Training Step:0 Accuracy = 0.5988 Loss = 2.1881988
Training Step:100 Accuracy = 0.8647 Loss = 0.58029664
Training Step:200 Accuracy = 0.879 Loss = 0.45982164
Training Step:300 Accuracy = 0.8866 Loss = 0.40857208
Training Step:400 Accuracy = 0.8904 Loss = 0.37808096
Training Step:500 Accuracy = 0.8943 Loss = 0.35697535
Training Step:600 Accuracy = 0.8974 Loss = 0.34104997
Training Step:700 Accuracy = 0.8984 Loss = 0.32834956
Training Step:800 Accuracy = 0.9 Loss = 0.31782663
Training Step:900 Accuracy = 0.9005 Loss = 0.30886236
Training Step:1000 Accuracy = 0.9009 Loss = 0.3010645
Training Step:1100 Accuracy = 0.9023 Loss = 0.29417014
Training Step:1200 Accuracy = 0.9029 Loss = 0.28799513
Training Step:1300 Accuracy = 0.9033 Loss = 0.28240603
Training Step:1400 Accuracy = 0.9039 Loss = 0.27730304
Training Step:1500 Accuracy = 0.9048 Loss = 0.27260992
Training Step:1600 Accuracy = 0.9057 Loss = 0.26826677
Training Step:1700 Accuracy = 0.9062 Loss = 0.2642261
Training Step:1800 Accuracy = 0.9061 Loss = 0.26044932
Training Step:1900 Accuracy = 0.9063 Loss = 0.25690478
Training Step:2000 Accuracy = 0.9066 Loss = 0.2535662
Training Step:2100 Accuracy = 0.9072 Loss = 0.25041154
Training Step:2200 Accuracy = 0.9073 Loss = 0.24742197
Training Step:2300 Accuracy = 0.9071 Loss = 0.24458146
Training Step:2400 Accuracy = 0.9066 Loss = 0.24187621
Training Step:2500 Accuracy = 0.9067 Loss = 0.23929419

Notice how the loss was still decreasing near the end but our accuracy slightly went down! This shows that we can still minimize our loss and hence maximize the accuracy over our training data, but this may not help us predict the testing data used for measuring accuracy. This is also known as overfitting (not generalizing). With the default settings, we got an accuracy of about 91%. If I wanted to cheat to get 94% accuracy, I could've set the test examples to 100. This shows how not having enough test examples can give you a biased sense of accuracy.

Keep in mind that this is a very inaccurate way to calculate our classifier's performance. However, we did this on purpose for the sake of learning and experimentation. Ideally, when training with large datasets, you train using small batches of training data at a time, not all at once.

This is the interesting part. Now that we have calculated our weight cheat sheet, we can create a graph with the following code:

for i in range(10):
    plt.subplot(2, 5, i+1)
    weight = sess.run(weights)[:,i]
    plt.title(i)
    plt.imshow(weight.reshape([28,28]), cmap=plt.get_cmap('seismic'))
    frame = plt.gca()
    frame.axes.get_xaxis().set_visible(False)
    frame.axes.get_yaxis().set_visible(False)

Figure 15: Visualization of our weights from 0-9

The preceding figure shows the model weights from 0-9, which is the most important side of our classifier. All this amount of work of machine learning is done to figure out what the optimal weights are. Once they are calculated based on an optimization criteria, you have the cheat sheet and can easily find your answers using the learned weights.

The learned model makes its prediction by comparing how similar or different the input digit sample is to the red and blue weights. The darker the red, the better the hit; white means neutral and blue means misses.

Now, let's use the cheat sheet and see how our model performs on it:

input_values_train, target_values_train = train_size(1)
visualize_digit(0)

Output:
Total Training Images in Dataset = (55000, 784)
############################################
input_values_train Samples Loaded = (1, 784)
target_values_train Samples Loaded = (1, 10)
[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]

Let's look at our softmax predictor:

answer = sess.run(softmax_layer, feed_dict={input_values: input_values_train})
print(answer)

The preceding code will give us a 10-dimensional vector, with each column containing one probability:

[[2.1248012e-05 1.1646927e-05 8.9631692e-02 1.9201526e-02 8.2086492e-04
  1.2516821e-05 3.8538201e-05 8.5374612e-01 6.9188857e-03 2.9596921e-02]]

We can use the argmax function to find out the most probable digit to be the correct classification for our input image:

answer.argmax()

Output:
7

Now, we get a correct classification from our network.

Let's use our knowledge to define a helper function that can select a random image from the dataset and test the model against it:

def display_result(ind):
    
    # Loading a training sample
    input_values_train = mnist_dataset.train.images[ind,:].reshape(1,784)
    target_values_train = mnist_dataset.train.labels[ind,:]
    
    # getting the label as an integer instead of one-hot encoded vector
    label = target_values_train.argmax()
    
    # Getting the prediction as an integer
    prediction = sess.run(softmax_layer, feed_dict={input_values: input_values_train}).argmax()
    plt.title('Prediction: %d Label: %d' % (prediction, label))
    plt.imshow(input_values_train.reshape([28,28]), cmap=plt.get_cmap('gray_r'))
    plt.show()

And now try it out:

display_result(ran.randint(0, 55000))

Output:

We've got a correct classification again!

Table of Contents for Model training

Create new playlist

Sign In

Sign Up

Table of Contents for
Model training