LSTM for image processing

Let's imagine we want to perform handwriting recognition. From time to time, we get a new column of data. Is it the end of a letter? If yes, which one? Is it the end of a word? Is it punctuation? All these questions can be answered with a recurrent network.

For our test example, we will go back to our 10-digit dataset and use LSTMs instead of convolution layers.

We use similar hyperparameters:

import tensorflow as tf
from tensorflow.contrib import rnn

# rows of 28 pixels
n_input=28
# unrolled through 28 time steps (our images are (28,28))
time_steps=28

# hidden LSTM units
num_units=128

# learning rate for adam
learning_rate=0.001
n_classes=10
batch_size=128

n_epochs = 10
step = 100

Setting up training and testing data is almost similar to our CNN example, except for the way we reshape the images:

import os
import numpy as np

from sklearn.datasets import fetch_mldata
from sklearn.model_selection import train_test_split
mnist = fetch_mldata('MNIST original')
mnist.data = mnist.data.astype(np.float32).reshape(
[-1, time_steps, n_input]) / 255.
mnist.num_examples = len(mnist.data)
mnist.labels = mnist.target.astype(np.int8)

X_train, X_test, y_train, y_test = train_test_split(
mnist.data, mnist.labels, test_size=(1. / 7.))

Let's quickly set up our network and its scaffolding:

x = tf.placeholder(tf.float32, [None,time_steps, n_input])
y = tf.placeholder(tf.int64, [None])

# processing the input tensor from [batch_size, n_steps,n_input]
# to "time_steps" number of [batch_size, n_input] tensors
input = tf.unstack(x, time_steps,1)

lstm_layer = rnn.BasicLSTMCell(num_units, forget_bias=True)
outputs, _ = rnn.static_rnn(lstm_layer, input,dtype=tf.float32)

prediction = tf.layers.dense(inputs=outputs[-1], units = n_classes)

loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
logits=prediction, labels=y))
opt = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

correct_prediction = tf.equal(tf.argmax(prediction,1), y)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

We are now ready to train:

with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(n_epochs):
permut = np.random.permutation(len(X_train))
print("epoch: %i" % epoch)
for j in range(0, len(X_train), batch_size):
if j % step == 0:
print(" batch: %i" % j)

batch = permut[j:j+batch_size]
Xs = X_train[batch]
Ys = y_train[batch]

sess.run(opt, feed_dict={x: Xs, y: Ys})

if j % step == 0:
acc=sess.run(accuracy,feed_dict={x:Xs,y:Ys})
los=sess.run(loss,feed_dict={x:Xs,y:Ys})
print(" accuracy %f" % acc)
print(" loss %f" % los)
print("")
epoch: 0
batch: 0
accuracy 0.195312
loss 2.275624

batch: 3200
accuracy 0.484375
loss 1.514501

batch: 54400
accuracy 0.992188
loss 0.022468

batch: 57600
accuracy 1.000000
loss 0.007411

We get quite high accuracy here as well, but we will leave it to the reader to check the accuracy over the test samples.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset