Building a feedforward neural network to recognize handwritten digits, version one

In this section, we will use the knowledge that we gained from the last two chapters to tackle a problem that has unstructured data – image classification. The idea is to take a dive into solving a Computer Vision task with the current setup and the basics of neural networks that we are familiar with. We have seen that feedforward neural networks can be used for prediction using structured data; let's try that on images to classify handwritten digits.

To solve this task, we are going to leverage the MNSIT database and use the handwritten digits dataset. MNSIT stands for Modified National Institute of Standards and Technology. It is a large database that's commonly used for training, testing, and benchmarking image-related tasks in Computer Vision.

The MNSIT digits dataset contains 60,000 images of handwritten digits, which are used for training the model, and 10,000 images of handwritten digits, which are used for testing the model.

From here out, we will be using Jupyter Notebook to understand and execute this task. So, please start your Jupyter Notebook and create a new Python Notebook if you have not already done so.

Once you have your notebook ready, the first thing to do, as always, is to import all the necessary modules for the task at hand:

Import numpy and set the seed for reproducibility:

import numpy as np
np.random.seed(42)

Load the Keras dependencies and the built-in MNSIT digits dataset:

import keras
from keras.datasets import mnist  
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD

Load the data into the training and test sets, respectively:

(X_train, y_train), (X_test, y_test)= mnist.load_data()

Check the number of training images, along with the size of each image. In this case, the size of each image is 28 x 28 pixels:

X_train.shape
(60000, 28, 28)

Check the dependent variable, in this case, 60,000 cases with the right label:

y_train.shape
(60000,)

Check the labels for the first 100 training samples:

y_train [0 :99]  
array([5, 0, 4, 1, 9, 2, 1, 3, 1, 4, 3, 5, 3, 6, 1, 7, 2, 8, 6, 9, 4, 0, 9,
       1, 1, 2, 4, 3, 2, 7, 3, 8, 6, 9, 0, 5, 6, 0, 7, 6, 1, 8, 7, 9, 3, 9,
       8, 5, 9, 3, 3, 0, 7, 4, 9, 8, 0, 9, 4, 1, 4, 4, 6, 0, 4, 5, 6, 1, 0,
       0, 1, 7, 1, 6, 3, 0, 2, 1, 1, 7, 9, 0, 2, 6, 7, 8, 3, 9, 0, 4, 6, 7,
       4, 6, 8, 0, 7, 8, 3], dtype=uint8)

Check the number of test images, along with the size of each image. In this case, the size of each image is 28 x 28 pixels:

X_test.shape
(10000, 28, 28)

Check the samples in the test data, which are basically 2D arrays of size 28 x 28:

X_test[0]
array([[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,      
          .
          .
,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 121, 254, 207,
         18,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0]], dtype=uint8)

Check the dependent variable, in this case, 10,000 cases with the right label:

y_test.shape
(10000,)

The right label for the previous first sample in the test set is as follows:

y_test[0]
7

Now, we need to pre-process the data by converting it from a 28 x 28 2D array into a normalized 1D array of 784 elements:

X_train = X_train.reshape(60000, 784).astype('float32')
X_test = X_test.reshape(10000, 784).astype('float32')
X_train/=255
X_test /=255

Check the first sample of the pre-processed dataset:

X_test[0]
array([ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        .
        .
        .
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.47450981,  0.99607843,  0.99607843,  0.85882354,  0.15686275,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.47450981,  0.99607843,
        0.81176472,  0.07058824,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ], dtype=float32)

The next step is to one-hot code the labels; in other words, we need to convert the data type of the labels (zero to nine) from numeric into categorical:

n_classes=10
y_train= keras.utils.to_categorical(y_train ,n_classes)
y_test= keras.utils.to_categorical(y_test,n_classes)

View the first sample of the label that has been one-hot coded. In this case, the number was seven:

y_test[0]
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.])

Now, we need to design our simple feedforward neural network with an input layer using the sigmoid activation function and 64 neurons. We will add a softmax function to the output layer, which does the classification by giving probabilities of the classified label:

model=Sequential()
model.add(Dense(64,activation='sigmoid', input_shape=(784,)))
model.add(Dense(10,activation='softmax'))

We can look at the structure of the neural network we just designed using the summary() function, which is a simple network with an input layer of 64 neurons and an output layer with 10 neurons. The output layer has 10 neurons we have 10 class labels to predict/classify (zero to nine):

model.summary()
_______________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense_1 (Dense)              (None, 64)                50240     
_______________________________________________________________
dense_2 (Dense)              (None, 10)                650       
=================================================================
Total params: 50,890
Trainable params: 50,890
Non-trainable params: 0
_________________________________________________________________

Next, we need to configure the model to use an optimizer, a cost function, and a metric to determine accuracy. Here, the optimizer that's being used is Scalar Gradient Descent (SGD) with a learning rate of 0.01. The loss function that's being used is the algebraic Mean Squared Error (MSE), and the metric to measure the correctness of the model is accuracy, which is the probability score:

model.compile(loss='mean_squared_error', optimizer=SGD(lr=0.01),metrics['accuracy'])

Now, we are ready to train the model. We want it to use 128 samples for every iteration of learning through the network, indicated by batch_size. We want each sample to iterate at least 200 times throughout the network, which is indicated by epochs. Also, we indicate the training and validation sets to be used. Verbose controls the output prints on the console:

model.fit(X_train,y_train,batch_size=128,epochs=200,
verbose=1,validation_data =(X_test,y_test))

Train on 60,000 samples, and then validate on 10,000 samples:

Epoch 1/200
60000/60000 [==============================] - 1s - loss: 0.0915 - acc: 0.0895 - val_loss: 0.0911 - val_acc: 0.0955
Epoch 2/200
.
.
.
60000/60000 [==============================] - 1s - loss: 0.0908 - acc: 
0.8579 - val_loss: 0.0274 - val_acc: 0.8649
Epoch 199/200
60000/60000 [==============================] - 1s - loss: 0.0283 - acc: 0.8585 - val_loss: 0.0273 - val_acc: 0.8656
Epoch 200/200
60000/60000 [==============================] - 1s - loss: 0.0282 - acc: 0.8587 - val_loss: 0.0272 - val_acc: 0.8658
<keras.callbacks.History at 0x7f308e68be48>

Finally, we can evaluate the model and how well the model predicts on the test dataset:

model.evaluate(X_test,y_test)
9472/10000 [===========================>..] - ETA: 0s
[0.027176343995332718, 0.86580000000000001]

This can be interpreted as having an error rate (MSE) of 0.027 and an accuracy of 0.865, which means it predicted the right label 86% of the time on the test dataset.

Table of Contents for Building a feedforward neural network to recognize handwritten digits, version one

Create new playlist

Sign In

Sign Up

Table of Contents for
Building a feedforward neural network to recognize handwritten digits, version one