Convolutional neural networks

Less than a decade ago, neural networks were not the best at image processing. The reason, other than data and CPU power, is that researchers were using dense layers. When stacking several layers and dense layers connecting thousands of pixels to, say, a thousand hidden units, we ended up with a non-convex cost function to optimize that had millions of parameters.

The curse of dimensionality was thus very much an issue, even the biggest databases may not have been enough. But let's go back to the introduction. Machine learning is not just training a model, it's also about feature processing. In image processing, people used lots of different tools to extract features from an image, but one common tool for all these preprocessing workflows was filtering.

So now, let's go back to neural networks. What if we could feed these filters inside a neural network? The issue then would be to know which filters would be the best ones. This is where convolutional networks come in: the convolutional layers create features and then the dense layers do their job (classification, regression, and so on).

Instead of having millions of coefficients as for a dense layer, we create an image of output pixels, with a fixed number of units for each pixel. Then there is a fixed number of weights for each of those units, and they are the same for all the pixels in the output image. When we move from one pixel to another in the output image, we also move the connections in the same way (perhaps with a stride) in the input image:

So a conv2d layer has a kernel of the of dimension [kernel_size_1, kernel_size_2, filters] dimension for its weight, which is very small if you consider the number of weights! That is much less than a thousand numbers, instead of over a million. This is trainable, and it's even possible to look at which features are relevant for the question we asked by looking at these weights. We should be able to see things as simple as a gradient filter, a Sobel, or perhaps a filter that looks at curves.

Now that we have all these pieces, we can put them together. We will try to use the hand-digits dataset again and classify these images into 10 classes (one for each number). We will also save the trained model and restore it with the two methods we saw earlier.

Let's start with some simple imports and some hyperparameters:

import tensorflow as tf
import numpy as np
from sklearn.model_selection import train_test_split
n_epochs = 10
learning_rate = 0.0002
batch_size = 128
export_dir = "data/classifier-mnist"
image_shape = [28,28,1]
step = 1000
dim_W3 = 1024
dim_W2 = 128
dim_W1 = 64
dropout_rate = 0.1

We will train the neural network for 10 epochs (so we will pass through the full training dataset 10 times), we will use a learning rate of 0.0002, a batch size of 128 (so we will train the model with 128 images at a time), and then we will use 64 convolutional filters, then 128 filters, and finally 1024 nodes in the last layer, before the last 10 nodes that will give us our classification result. Finally, the 1,024-nodes layer will have also a dropout section with a rate of 0.1, meaning that during the training, we will always arbitrarily set 102-node output to 0 on that layer:

from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original')
mnist.data.shape = (-1, 28, 28)
mnist.data = mnist.data.astype(np.float32).reshape( [-1, 28, 28, 1]) / 255.
mnist.num_examples = len(mnist.data)
mnist.labels = mnist.target.astype(np.int64)
X_train, X_test, y_train, y_test = train_test_split(mnist.data, mnist.labels, test_size=(1. / 7.))

We now get the data, reshape it, change its type, and keep 60,000 images for training and 10,000 for testing purposes. The labels will be int64 because that's what we will use for our custom-check function. We don't have to transform the labels in a one-hot array, because TensorFlow already has a function that tackles that. No need to add more processing than required!

Why a four-dimension matrix? The first dimension, -1, is our batch size, and it will be dynamic. The second two dimensions are for the width and height of our image. The final one is the number of input channels, here just 1.

Let's create our Convolutional Neural Network class:

class CNN():
def __init__(
self,
image_shape=(28,28,1)
dim_W3=1024,
dim_W2=128,
dim_W1=64,
classes=10
):

self.image_shape = image_shape

self.dim_W3 = dim_W3
self.dim_W2 = dim_W2
self.dim_W1 = dim_W1
self.classes = classes

We create a class for our CNN, and we save locally a few of the parameters we set earlier:

def create_conv2d(self, input, filters, kernel_size, name):
layer = tf.layers.conv2d(
inputs=input,
filters=filters,
kernel_size=kernel_size,
activation=tf.nn.leaky_relu,
name="Conv2d_" + name,
padding="same")
return layer

This method creates a convolutional layer with the parameters we saw earlier, and with filters and kernel_size. We set the output activation to be a leaky relu, because it gives nice results for these cases.

The padding parameter can be same or precise. The second option relates to the convolution equation. When we don't want to have partial convolutions (on the edges of the image), this is the option we want to use.
def create_maxpool(self, input, name):
layer = tf.layers.max_pooling2d(
inputs=input,
pool_size=[2,2],
strides=2,
name="MaxPool_" + name)
return layer

The maximum pool layer is also very straightforward. We want to get the maximum on a 2x2 pixel range, and the output size will be the original image divided by 2 in all directions (hence the stride equal to 2):

def create_dropout(self, input, name, is_training):
layer = tf.layers.dropout(
inputs=input,
rate=dropout_rate,
name="DropOut_" + name,
training=is_training)
return layer

The dropout layer that we are introducing in this example has an additional parameter, a placeholder named is_training. It will be important to deactivate this layer when we test the data (or when we use the model after training):

def create_dense(self, input, units, name, is_training):
layer = tf.layers.dense(
inputs=input,
units=units,
name="Dense" + name,
)
layer = tf.layers.batch_normalization(
inputs=layer,
momentum=0,
epsilon=1e-8,
training=is_training,
name="BatchNorm_" + name,
)
layer = tf.nn.leaky_relu(layer, name="LRELU_" + name)
return layer

Our dense layer is more complicated than a regular one. We added a batch_normalization step before the activation, which will scale our gradient with respect to the batch size. There is also an option there to use momentum, which makes the optimization similar to RMSProp:

def discriminate(self, image, training):
h1 = self.create_conv2d(image, self.dim_W3, 5, "Layer1”)
h1 = self.create_maxpool(h1, "Layer1")

h2 = self.create_conv2d(h1, self.dim_W2, 5, "Layer2")
h2 = self.create_maxpool(h2, "Layer2")
h2 = tf.reshape(h2, (-1, self.dim_W2 * 7 * 7))

h3 = self.create_dense(h2, self.dim_W1, "Layer3", train-ing)
h3 = self.create_dropout(h3, "Layer3", training)

h4 = self.create_dense(h3, self.classes, "Layer4”, train-ing)
return h4

Now that we have all the individual blocks for our network, we can put it together. So it's going to be:

Let's start creating our model:

def build_model(self):
image = tf.placeholder(tf.float32,
[None]+self.image_shape, name="image")
Y = tf.placeholder(tf.int64, [None], name="label")
training = tf.placeholder(tf.bool, name="is_training")

probabilities = self.discriminate(image, training)
cost = tf.reduce_mean(
tf.nn.sparse_softmax_cross_entropy_with_logits(labels=Y,
logits=probabilities))
accuracy = tf.reduce_mean(
tf.cast(tf.equal(tf.argmax(probabilities, axis=1), Y),
tf.float32), name=" accuracy")

return image, Y, cost, accuracy, probabilities, training

Adding the scaffolding with a placeholder for the input image, and another one for the labels and the training, we now use the sparse_softmax_cross_entropy_with_logits cost function, which takes as arguments a single valued labels array and a tensor (output of a dense layer) named logits. This function is very good when we only have one active label at a time (so it's very good for classification, but not for image-annotation, for instance).

It is now time to use this new class:

cnn_model = CNN(
image_shape=image_shape,
dim_W1=dim_W1,
dim_W2=dim_W2,
dim_W3=dim_W3,
)
image_tf, Y_tf, cost_tf, accuracy_tf, output_tf, training_tf =
cnn_model.build_model()
train_step = tf.train.AdamOptimizer(learning_rate,
beta1=0.5).minimize(cost_tf)

saver = tf.train.Saver(max_to_keep=10)
builder = tf.saved_model.builder.SavedModelBuilder(export_dir)

We use it to also instantiate our optimizer (here Adam), and we take the opportunity to build our model serializers:

accuracy_vec = []
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(n_epochs):
permut = np.random.permutation(len(X_train))

print("epoch: %i" % epoch)
for j in range(0, len(X_train), batch_size):
if j % step == 0:
print(" batch: %i" % j)

batch = permut[j:j+batch_size]
Xs = X_train[batch]
Ys = y_train[batch]

sess.run(train_step,
feed_dict={
training_tf: True,
Y_tf: Ys,
image_tf: Xs
})
if j % step == 0:
temp_cost, temp_prec = sess.run([cost_tf, accura-cy_tf],
feed_dict={
training_tf: False,
Y_tf: Ys,
image_tf: Xs
})
print(" cost: %f prec: %f" % (temp_cost, temp_prec))
saver.save(sess, './classifier', global_step=epoch)
saver.save(sess, './classifier-final')
builder.add_meta_graph_and_variables(sess,
[tf.saved_model.tag_constants.TRAINING])
builder.save()
Epoch #-1
train accuracy = 0.068963
test accuracy = 0.071796
Result for the 10 first training images: [0 8 9 9 7 6 3 5 1 3]
Reference for the 10 first training images: [9 8 4 4 9 8 1 8 2 5]
epoch: 0
batch: 0
cost: 1.319493
prec: 0.687500
batch: 16000
cost: 0.452003
prec: 1.000000
batch: 32000
cost: 0.383446
prec: 1.000000
batch: 48000
cost: 0.392471
prec: 0.992188
Epoch #0
train accuracy = 0.991166
test accuracy = 0.986650
#...
Epoch #9
train accuracy = 0.999833
test accuracy = 0.991693
Result for the 10 first training images: [9 8 4 4 9 3 1 8 2 5]
Reference for the 10 first training images: [9 8 4 4 9 8 1 8 2 5]

This is the usual pattern we followed for our previous examples; we just added the savers for intermediate layers. Note that the builder required a final save() call after the end of the session.

Without any training, the accuracy of the algorithm is around 1 in 10, which is what a random network will achieve. After 10 epochs, we get an accuracy close to 1 for training and testing. Let's see how the training and the test errors evolve with them:

from matplotlib import pyplot as plt

accuracy = np.array(accuracy_vec)
plt.semilogy(1 - accuracy[:,0], 'k-', label="train")
plt.semilogy(1 - accuracy[:,1], 'r-', label="test")
plt.title('Classification error per Epoch')
plt.xlabel('Epoch')
plt.ylabel('Classification error')
plt.legend()
plt.show()

Refer to the following graph:

Obviously, more training epochs lower the training error, but after a handful of epochs, the test error (the generalization power) doesn't evolve. This means that there is no point in trying to throw more time at it. But perhaps changing a few parameters may help? Or a different activation function?

As we saved the trained network, we can restore it with our two methods:

tf.reset_default_graph()
new_saver = tf.train.import_meta_graph("classifier-final.meta")

with tf.Session() as sess:
new_saver.restore(sess, tf.train.latest_checkpoint('./'))

graph = tf.get_default_graph()
training_tf = graph.get_tensor_by_name('is_training:0')
Y_tf = graph.get_tensor_by_name('label:0')
image_tf = graph.get_tensor_by_name('image:0')
accuracy_tf = graph.get_tensor_by_name('accuracy:0')
output_tf = graph.get_tensor_by_name('LeakyRELU_Layer4/Maximum:0')
show_train(sess, 0) # Function defined in the support notebook
INFO:tensorflow:Restoring parameters from ./classifier-final
Epoch #0
train accuracy = 0.999833
test accuracy = 0.991693
Result for the 10 first training images: [9 8 4 4 9 3 1 8 2 5]
Reference for the 10 first training images: [9 8 4 4 9 8 1 8 2 5]

And the second method:

tf.reset_default_graph()
with tf.Session() as sess:
tf.saved_model.loader.load(sess, [tf.saved_model.tag_constants.TRAINING], export_dir)
graph = tf.get_default_graph()
training_tf = graph.get_tensor_by_name('is_training:0')
Y_tf = graph.get_tensor_by_name('label:0')
image_tf = graph.get_tensor_by_name('image:0')
accuracy_tf = graph.get_tensor_by_name('accuracy:0')
output_tf = graph.get_tensor_by_name('LeakyRELU_Layer4/Maximum:0')
show_train(sess, 0)
INFO:tensorflow:Restoring parameters from b'data/classifier-mnist/variables/variables'
Epoch #0
train accuracy = 0.999833
test accuracy = 0.991693
Result for the 10 first training images: [9 8 4 4 9 3 1 8 2 5]
Reference for the 10 first training images: [9 8 4 4 9 8 1 8 2 5]

They both return the same training and testing error as the one we got after our training, so we are good to reuse it for additional classifications.

It is now time to tackle another type of network, the recurrent neural networks.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset