In this section, we’ll explore a popular deep learning model: the recurrent neural network and how it can be used in the generation of sequence data. The universal way to create sequence data in deep learning is to train a model (usually an RNN or a convnet) to predict the next token or next few tokens in a series, based on the previous tokens as input. For instance, let's imagine that we're given the sentence with these words as input "i love to work in deep learning", we will train the network to predict the next character as our target.
Upon training of the language model, we can then proceed to feed some initial text and ask it to generate the next token, then add the generated token back into the language model to further predict next tokens. For our hypothetical use case, our creative client will use this model and later provide examples of text that we would then be asked to create novel content in that style.
The first step in building the generative model for text is to import all the modules required. Keras API's will be used in this project to create the models and keras utils to download the dataset. In order to build text generation modules, we need a significant amount of simple text data.
You can find code file at: (https://github.com/PacktPublishing/Python-Deep-Learning-Projects/blob/master/Chapter%206/Basics/generative_text.py):
import keras
import numpy as np
from keras import layers
# Gather data
path = keras.utils.get_file(
'sample.txt',
origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()
print('Number of words in corpus:', len(text))