LSTM for image processing

Let's imagine we want to perform handwriting recognition. From time to time, we get a new column of data. Is it the end of a letter? If yes, which one? Is it the end of a word? Is it punctuation? All these questions can be answered with a recurrent network.

For our test example, we will go back to our 10-digit dataset and use LSTMs instead of convolution layers.

We use similar hyperparameters:

import tensorflow as tf
from tensorflow.contrib import rnn

# rows of 28 pixels
n_input=28
# unrolled through 28 time steps (our images are (28,28))
time_steps=28

# hidden LSTM units
num_units=128

# learning rate for adam
learning_rate=0.001
n_classes=10
batch_size=128

n_epochs = 10
step = 100

Setting up training and testing data is almost similar to our CNN example, except for the way we reshape the images:

import os
import numpy as np

from sklearn.datasets import fetch_mldata
from sklearn.model_selection import train_test_split
mnist = fetch_mldata('MNIST original')
mnist.data = mnist.data.astype(np.float32).reshape(
    [-1, time_steps, n_input]) / 255.
mnist.num_examples = len(mnist.data)
mnist.labels = mnist.target.astype(np.int8)

X_train, X_test, y_train, y_test = train_test_split(
    mnist.data, mnist.labels, test_size=(1. / 7.))

Let's quickly set up our network and its scaffolding:

x = tf.placeholder(tf.float32, [None,time_steps, n_input])
y = tf.placeholder(tf.int64, [None])

# processing the input tensor from [batch_size, n_steps,n_input]
# to "time_steps" number of [batch_size, n_input] tensors
input = tf.unstack(x, time_steps,1)

lstm_layer = rnn.BasicLSTMCell(num_units, forget_bias=True)
outputs, _ = rnn.static_rnn(lstm_layer, input,dtype=tf.float32)

prediction = tf.layers.dense(inputs=outputs[-1], units = n_classes)

loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
    logits=prediction, labels=y))
opt = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

correct_prediction = tf.equal(tf.argmax(prediction,1), y)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

We are now ready to train:

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for epoch in range(n_epochs):
        permut = np.random.permutation(len(X_train))
        print("epoch: %i" % epoch)
        for j in range(0, len(X_train), batch_size):
            if j % step == 0:
                print(" batch: %i" % j)

            batch = permut[j:j+batch_size]
            Xs = X_train[batch]
            Ys = y_train[batch]

            sess.run(opt, feed_dict={x: Xs, y: Ys})

            if j % step == 0:
                acc=sess.run(accuracy,feed_dict={x:Xs,y:Ys})
                los=sess.run(loss,feed_dict={x:Xs,y:Ys})
                print(" accuracy %f" % acc)
                print(" loss %f" % los)
                print("")
epoch: 0
 batch: 0
 accuracy 0.195312
 loss 2.275624

 batch: 3200
 accuracy 0.484375
 loss 1.514501
…
 batch: 54400
 accuracy 0.992188
 loss 0.022468

 batch: 57600
 accuracy 1.000000
 loss 0.007411

We get quite high accuracy here as well, but we will leave it to the reader to check the accuracy over the test samples.

Table of Contents for LSTM for image processing

Create new playlist

Sign In

Sign Up

Table of Contents for
LSTM for image processing