LSTM for predicting text

Our first test for LSTM will be text prediction. Our network will learn the next word in a phrase, just like we had to do when we memorized a poem at school. Here, we just train it on a small poem, but if such a network was trained on a writer's full texts, with more capacity (so with more layers and probably larger ones), it could learn their style and write like the writer.

Let's store our fable:

text="""A slave named Androcles once escaped from his master and fled to the forest. As he was wandering about there he came upon a Lion lying down moaning and groaning. At first he turned to flee, but finding that the Lion did not pursue him, he turned back and went up to him.

As he came near, the Lion put out his paw, which was all swollen and bleeding, and Androcles found that a huge thorn had got into it, and was causing all the pain. He pulled out the thorn and bound up the paw of the Lion, who was soon able to rise and lick the hand of Androcles like a dog. Then the Lion took Androcles to his cave, and every day used to bring him meat from which to live.

But shortly afterwards both Androcles and the Lion were captured, and the slave was sentenced to be thrown to the Lion, after the latter had been kept without food for several days. The Emperor and all his Court came to see the spectacle, and Androcles was led out into the middle of the arena. Soon the Lion was let loose from his den, and rushed bounding and roaring towards his victim.

But as soon as he came near to Androcles he recognised his friend, and fawned upon him, and licked his hands like a friendly dog. The Emperor, surprised at this, summoned Androcles to him, who told him the whole story. Whereupon the slave was pardoned and freed, and the Lion let loose to his native forest."""

We can get rid of the punctuation and tokenize it:

training_data = text.lower().replace(",", "").replace(".", "").split()

We can now convert the tokens (or words) to integers by indexing the words and then creating a mapping between words and integers and vice versa (also referred to as the Bag of Words). We can also gain some time by transforming our token array, representing the text to an array of integers, the integers being the index for the mapping to words:

def build_dataset(words):
    count = list(set(words))
    dictionary = dict()
    for word, _ in count:
        dictionary[word] = len(dictionary)
    reverse_dictionary = dict(zip(dictionary.values(), diction-ary.keys()))
    return dictionary, reverse_dictionary

dictionary, reverse_dictionary = build_dataset(training_data)
training_data_args = [dictionary[word] for word in training_data]

Our main layer for RNN is not part of TensorFlow but part of the contrib package. Creating it requires more than one line, but it is self-explanatory. We end up with a dense layer with as many output nodes as we have tokens:

import tensorflow as tf
tf.reset_default_graph()
from tensorflow.contrib import rnn

def RNN(x):
    # Generate a n_input-element sequence of inputs
    # (eg. [had] [a] [general] -> [20] [6] [33])
    x = tf.split(x,n_input,1)

    # 1-layer LSTM with n_hidden units.
    rnn_cell = rnn.BasicLSTMCell(n_hidden)

    # generate prediction
    outputs, states = rnn.static_rnn(rnn_cell, x, dtype=tf.float32)

    # there are n_input outputs but we only want the last output
    return tf.layers.dense(inputs = outputs[-1], units = vocab_size)
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.int64, [None])

Let's add our hyper parameters:

import random
import numpy as np

vocab_size = len(dictionary)

# Parameters
learning_rate = 0.001
training_iters = 50000
display_step = 1000
# number of inputs (past words that we use)
n_input = 3
# number of units in the RNN cell
n_hidden = 512

We are ready to create the network and our optimization cost function:

pred = RNN(x)

cost = tf.reduce_mean(
    tf.nn.sparse_softmax_cross_entropy_with_logits(logits=pred,
        labels=y))
optimizer = tf.train.RMSPropOptimizer(
    learning_rate=learning_rate).minimize(cost)

correct_pred = tf.equal(tf.argmax(pred,1), y)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

Let's got for the training:

with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    
    step = 0
    offset = random.randint(0,n_input+1)
    end_offset = n_input + 1
    acc_total = 0
    loss_total = 0

    while step < training_iters:
        # Batch with just one sample. Add some randomness on se-lection process.
        if offset > (len(training_data)-end_offset):
            offset = random.randint(0, n_input+1)

        symbols_in_keys = [ [training_data_args[i]] 
            for i in range(offset, offset+n_input) ]
        symbols_in_keys = np.reshape(np.array(symbols_in_keys), 
            [1, n_input])

        symbols_out_onehot = [training_data_args[offset+n_input]]

        _, acc, loss, onehot_pred = session.run(
            [optimizer, accu-racy, cost, pred],
            feed_dict={x: sym-bols_in_keys, y: symbols_out_onehot})
        loss_total += loss
        acc_total += acc
        if (step+1) % display_step == 0:
            print(("Iter= %i , Average Loss= %.6f," + 
                " Average Accuracy= %.2f%%") % 
                (step+1, loss_total/display_step,
                100*acc_total/display_step))
            acc_total = 0
            loss_total = 0
            symbols_in = [training_data[i]
                for i in range(offset, offset + n_input)]
            symbols_out = training_data[offset + n_input]
            symbols_out_pred = reverse_dictionary[
                np.argmax(onehot_pred, axis=1)[0]]
            print("%s - [%s] vs [%s]" % 
                (symbols_in, symbols_out, symbols_out_pred))
        step += 1
        offset += (n_input+1)
Iter= 1000 , Average Loss= 4.034577, Average Accuracy= 11.50%
['shortly', 'afterwards', 'both'] - [androcles] vs [to]
Iter= 2000 , Average Loss= 3.143990, Average Accuracy= 21.10%
['he', 'came', 'upon'] - [a] vs [he]
Iter= 3000 , Average Loss= 2.145266, Average Accuracy= 44.10%
['and', 'the', 'slave'] - [was] vs [was]
…
Iter= 48000 , Average Loss= 0.442764, Average Accuracy= 87.90%
['causing', 'all', 'the'] - [pain] vs [pain]
Iter= 49000 , Average Loss= 0.507615, Average Accuracy= 85.20%
['recognised', 'his', 'friend'] - [and] vs [and]
Iter= 50000 , Average Loss= 0.427877, Average Accuracy= 87.10%
['of', 'androcles', 'like'] - [a] vs [a]

The accuracy is far from good, but for a small trial, it's already quite interesting. By using the pred function, we can ask the network to generate a new word based on three input words inside the previous session:

symbols_in_keys = [ [‘causing’], [‘all’], [‘the’]]
symbols_in_keys = 
    np.reshape(np.array(symbols_in_keys), [1, n_input])

onehot_pred = session.run(pred, feed_dict={x: sym-bols_in_keys})
print(“Estimate is: %s” % 
    reverse_dictionary[np.argmax(onehot_pred, axis=1)[0]])
Estimate is: pain

An interesting question is what would happen with more training time? And what would happen if we changed the intermediate layer to use multiple LSTM layers?

Recurrent neural networks are not just for text or finance. They can also be used for image recognition.

Table of Contents for LSTM for predicting text

Create new playlist

Sign In

Sign Up

Table of Contents for
LSTM for predicting text