Data analysis

So, let's go ahead and start implementing our classifier. Let's start off by importing the required packages for this implementation:

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
import random as ran

Next up, we are going to define some helper functions to make us able to subset from the original dataset that we have downloaded:

#Define some helper functions 
# to assign the size of training and test data we will take from MNIST dataset
def train_size(size):
print ('Total Training Images in Dataset = ' + str(mnist_dataset.train.images.shape))
print ('############################################')
input_values_train = mnist_dataset.train.images[:size,:]
print ('input_values_train Samples Loaded = ' + str(input_values_train.shape))
target_values_train = mnist_dataset.train.labels[:size,:]
print ('target_values_train Samples Loaded = ' + str(target_values_train.shape))
return input_values_train, target_values_train

def test_size(size):
print ('Total Test Samples in MNIST Dataset = ' + str(mnist_dataset.test.images.shape))
print ('############################################')
input_values_test = mnist_dataset.test.images[:size,:]
print ('input_values_test Samples Loaded = ' + str(input_values_test.shape))
target_values_test = mnist_dataset.test.labels[:size,:]
print ('target_values_test Samples Loaded = ' + str(target_values_test.shape))
return input_values_test, target_values_test

Also, we're going to define two helper functions for displaying specific digits from the dataset or even display a flattened version of a subset of images:

#Define a couple of helper functions for digit images visualization
def visualize_digit(ind):
print(target_values_train[ind])
target = target_values_train[ind].argmax(axis=0)
true_image = input_values_train[ind].reshape([28,28])
plt.title('Sample: %d Label: %d' % (ind, target))
plt.imshow(true_image, cmap=plt.get_cmap('gray_r'))
plt.show()

def visualize_mult_imgs_flat(start, stop):
imgs = input_values_train[start].reshape([1,784])
for i in range(start+1,stop):
imgs = np.concatenate((imgs, input_values_train[i].reshape([1,784])))
plt.imshow(imgs, cmap=plt.get_cmap('gray_r'))
plt.show()

Now, let's get down to business and start playing around with the dataset. So we are going to define the training and testing examples that we would like to load from the original dataset.

Now, we'll get down to the business of building and training our model. First, we define variables with how many training and test examples we would like to load. For now, we will load all the data, but we will change this value later on to save resources:

input_values_train, target_values_train = train_size(55000)

Output:
Total Training Images in Dataset = (55000, 784) ############################################ input_values_train Samples Loaded = (55000, 784) target_values_train Samples Loaded = (55000, 10)

So now, we have a training set of 55,000 samples of handwritten digits, and each sample is 28 by 28 pixel images flattened to be a 784-dimensional vector. We also have their corresponding labels in a one-hot encoding format.

The target_values_train data are the associated labels for all the input_values_train samples. In the following example, the array represents a 7 in one-hot encoding format:

Figure 11: One hot encoding for the digit 7

So let's visualize a random image from the dataset and see how it looks like, so we are going to use our preceding helper function to display a random digit from the dataset:

visualize_digit(ran.randint(0, input_values_train.shape[0]))

Output:
Figure 12: Output digit of the display_digit method

We can also visualize a bunch of flattened images using the helper function defined before. Each value in the flattened vector represents a pixel intensity, so visualizing the pixels will look like this:

visualize_mult_imgs_flat(0,400)
Figure 13: First 400 training examples
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset