Implementing the project

Now that we know how an RNN is able to build a character-level model, let's implement the project to generate our own words and sentences through an RNN. Generally, RNN training is computationally intensive and it is suggested that we run the code on a graphical processing unit (GPU). However, due to infrastructure limitations, we are not going to use a GPU for the project code. The mxnet library allows a character-level language model with an RNN to be executed on the CPU itself, so let's start coding our project:

# including the required libraries
library("readr")
library("stringr")
library("stringi")
library("mxnet")
library("languageR")

To use the languageR library's ALICE'S ADVENTURES IN WONDERLAND book text and load it into memory, use the following code:

data(alice)

Next, we transform the test into feature vectors that is fed into the RNN model. The make_data function reads the dataset, cleans it of any non-alphanumeric characters, splits it into individual characters and groups it into sequences of length seq.len. In this case, seq.len is set to 100:

make_data <- function(txt, seq.len = 32, dic=NULL) {
text_vec <- as.character(txt)
text_vec <- stri_enc_toascii(str = text_vec)
text_vec <- str_replace_all(string = text_vec, pattern = "[^[:print:]]", replacement = "")
text_vec <- strsplit(text_vec, '') %>% unlist
if (is.null(dic)) {
char_keep <- sort(unique(text_vec))
} else char_keep <- names(dic)[!dic == 0]

To remove those terms that are not part of dictionary, use the following code:

text_vec <- text_vec[text_vec %in% char_keep]

To build a dictionary and adjust it by -1 to have a 1-lag for labels, use the following code:

dic <- 1:length(char_keep)
names(dic) <- char_keep
# reversing the dictionary
rev_dic <- names(dic)
names(rev_dic) <- dic
# Adjust by -1 to have a 1-lag for labels
num.seq <- (length(text_vec) - 1) %/% seq.len
features <- dic[text_vec[1:(seq.len * num.seq)]]
labels <- dic[text_vec[1:(seq.len*num.seq) + 1]]
features_array <- array(features, dim = c(seq.len, num.seq))
labels_array <- array(labels, dim = c(seq.len, num.seq))
return (list(features_array = features_array, labels_array = labels_array, dic = dic, rev_dic
= rev_dic))
}

Set the sequence length as 100, then build the long sequence of text from individual words in alice data character vector. Then call the make_data() function on the alice_in_wonderland text file. Observe that seq.ln and an empty dictionary is passed as input. seq.ln dictates the context that is the number of characters that the RNN need to look back inorder to generate the next character. During the training seq.ln is utilized to get the right weights:

seq.len <- 100
alice_in_wonderland<-paste(alice,collapse=" ")
data_prep <- make_data(alice_in_wonderland, seq.len = seq.len, dic=NULL)

To view the prepared data, use the following code:

print(str(data_prep))

This will give the following output:

> print(str(data_prep))
List of 4
$ features_array: int [1:100, 1:1351] 9 31 25 13 17 1 45 1 9 15 ...
$ labels_array : int [1:100, 1:1351] 31 25 13 17 1 45 1 9 15 51 ...
$ dic : Named int [1:59] 1 2 3 4 5 6 7 8 9 10 ...
..- attr(*, "names")= chr [1:59] " " "-" "[" "]" ...
$ rev_dic : Named chr [1:59] " " "-" "[" "]" ...
..- attr(*, "names")= chr [1:59] "1" "2" "3" "4" ...

To view the features array, use the following code:

# Viewing the feature array
View(data_prep$features_array)

This will give the following output:

To view the labels array, use the following code:

# Viewing the labels array
View(data_prep$labels_array)

You will get the following output:

Now, let's print the dictionary, which includes the unique characters, using the following code:

# printing the dictionary - the unique characters
print(data_prep$dic)

You will get the following output:

> print(data_prep$dic)
- [ ] * 0 3 a A b B c C d D e E f F g G h H i I j J k K l L m M n N o O p
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
P q Q r R s S t T u U v V w W x X y Y z Z
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59

Use the following code to print the indexes of the characters:

# printing the indexes of the characters
print(data_prep$rev_dic)

This will give the following output:

  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28
" " "-" "[" "]" "*" "0" "3" "a" "A" "b" "B" "c" "C" "d" "D" "e" "E" "f" "F" "g" "G" "h" "H" "i" "I" "j" "J" "k"
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
"K" "l" "L" "m" "M" "n" "N" "o" "O" "p" "P" "q" "Q" "r" "R" "s" "S" "t" "T" "u" "U" "v" "V" "w" "W" "x" "X" "y"
57 58 59
"Y" "z" "Z"

Use the following code block to fetch the features and labels to train the model, split the data into training and evaluation in a 90:10 ratio:

X <- data_prep$features_array
Y <- data_prep$labels_array
dic <- data_prep$dic
rev_dic <- data_prep$rev_dic
vocab <- length(dic)
samples <- tail(dim(X), 1)
train.val.fraction <- 0.9
X.train.data <- X[, 1:as.integer(samples * train.val.fraction)]
X.val.data <- X[, -(1:as.integer(samples * train.val.fraction))]
X.train.label <- Y[, 1:as.integer(samples * train.val.fraction)]
X.val.label <- Y[, -(1:as.integer(samples * train.val.fraction))]
train_buckets <- list("100" = list(data = X.train.data, label = X.train.label))
eval_buckets <- list("100" = list(data = X.val.data, label = X.val.label))
train_buckets <- list(buckets = train_buckets, dic = dic, rev_dic = rev_dic)
eval_buckets <- list(buckets = eval_buckets, dic = dic, rev_dic = rev_dic)

Use the following code to create iterators for training and evaluation datasets:

vocab <- length(eval_buckets$dic)
batch.size <- 32
train.data <- mx.io.bucket.iter(buckets = train_buckets$buckets, batch.size = batch.size, data.mask.element = 0, shuffle = TRUE)
eval.data <- mx.io.bucket.iter(buckets = eval_buckets$buckets, batch.size = batch.size,data.mask.element = 0, shuffle = FALSE)

Create a multi-layer RNN model to sample from character-level language models. It has a one-to-one model configuration since, for each character, we want to predict the next one. For a sequence of length 100, there are also 100 labels, corresponding to the same sequence of characters but offset by a position of +1. The parameter's output_last_state is set to TRUE, this is to access the state of the RNN cells when performing inference and we can see lstm cells are used.

rnn_graph_one_one <- rnn.graph(num_rnn_layer = 3,
num_hidden = 96,
input_size = vocab,
num_embed = 64,
num_decode = vocab,
dropout = 0.2,
ignore_label = 0,
cell_type = "lstm",
masking = F,
output_last_state = T,
loss_output = "softmax",
config = "one-to-one")

Use the following code to visualize the RNN model:

graph.viz(rnn_graph_one_one, type = "graph",
graph.height.px = 650, shape=c(500, 500))

The following diagram shows the resultant output:

Now, use the following line of code to set the CPU as the device to execute the code:

devices <- mx.cpu()

Then, initializing the weights of the network through the Xavier initializer:

initializer <- mx.init.Xavier(rnd_type = "gaussian", factor_type = "avg", magnitude = 3)

Use the adadelta optimizer to update the weights in the network through the learning process:

optimizer <- mx.opt.create("adadelta", rho = 0.9, eps = 1e-5, wd = 1e-8,
clip_gradient = 5, rescale.grad = 1/batch.size)

Use the following lines of code to set up logging of metrics and define a custom measurement function:

logger <- mx.metric.logger()
epoch.end.callback <- mx.callback.log.train.metric(period = 1, logger = logger)
batch.end.callback <- mx.callback.log.train.metric(period = 50)
mx.metric.custom_nd <- function(name, feval) {
init <- function() {
c(0, 0)
}
update <- function(label, pred, state) {
m <- feval(label, pred)
state <- c(state[[1]] + 1, state[[2]] + m)
return(state)
}
get <- function(state) {
list(name=name, value = (state[[2]] / state[[1]]))
}
ret <- (list(init = init, update = update, get = get))
class(ret) <- "mx.metric"
return(ret)
}

Perplexity is a measure of how variable a prediction model is. If perplexity is a measure of prediction error, define a function to compute the error, using the following lines of code:

mx.metric.Perplexity <- mx.metric.custom_nd("Perplexity", function(label, pred) {
label <- mx.nd.reshape(label, shape = -1)
label_probs <- as.array(mx.nd.choose.element.0index(pred, label))
batch <- length(label_probs)
NLL <- -sum(log(pmax(1e-15, as.array(label_probs)))) / batch
Perplexity <- exp(NLL)
return(Perplexity)
}

Use the following code to execute the model creation and you will see that in this project we are running it for 20 iterations:

model <- mx.model.buckets(symbol = rnn_graph_one_one,
train.data = train.data, eval.data = eval.data,
num.round = 20, ctx = devices, verbose = TRUE,
metric = mx.metric.Perplexity,
initializer = initializer,
optimizer = optimizer,
batch.end.callback = NULL,
epoch.end.callback = epoch.end.callback)

This will give the following output:

Start training with 1 devices
[1] Train-Perplexity=23.490355102639
[1] Validation-Perplexity=17.6250266989171
[2] Train-Perplexity=14.4508382001841
[2] Validation-Perplexity=12.8179427398927
[3] Train-Perplexity=10.8156810097278
[3] Validation-Perplexity=9.95208184606089
[4] Train-Perplexity=8.6432934902383
[4] Validation-Perplexity=8.21806492033906
[5] Train-Perplexity=7.33073759154393
[5] Validation-Perplexity=7.03574648385079
[6] Train-Perplexity=6.32024660528852
[6] Validation-Perplexity=6.1394327776089
[7] Train-Perplexity=5.61888374338248
[7] Validation-Perplexity=5.59925324885983
[8] Train-Perplexity=5.14009899947491]
[8] Validation-Perplexity=5.29671693342219
[9] Train-Perplexity=4.77963053659987
[9] Validation-Perplexity=4.98471501141549
[10] Train-Perplexity=4.5523402301526
[10] Validation-Perplexity=4.84636357676712
[11] Train-Perplexity=4.36693337145912
[11] Validation-Perplexity=4.68806078057635
[12] Train-Perplexity=4.21294955131918
[12] Validation-Perplexity=4.53026345109037
[13] Train-Perplexity=4.08935886339982
[13] Validation-Perplexity=4.50495393289961
[14] Train-Perplexity=3.99260373800419
[14] Validation-Perplexity=4.42576079641165
[15] Train-Perplexity=3.91330125104996
[15] Validation-Perplexity=4.3941619024578
[16] Train-Perplexity=3.84730588206837
[16] Validation-Perplexity=4.33288830915229
[17] Train-Perplexity=3.78711049085869
[17] Validation-Perplexity=4.28723362252784
[18] Train-Perplexity=3.73198720637659
[18] Validation-Perplexity=4.22839393379393
[19] Train-Perplexity=3.68292148768833
[19] Validation-Perplexity=4.22187018296206
[20] Train-Perplexity=3.63728269095417
[20] Validation-Perplexity=4.17983276293299

Next, save the model for later use, then load the model from the disk to infer and sample the text character by character, and finally merge the predicted characters into a sentence using the following code:

mx.model.save(model, prefix = "one_to_one_seq_model", iteration = 20)
# the generated text is expected to be similar to the training data
set.seed(0)
model <- mx.model.load(prefix = "one_to_one_seq_model", iteration = 20)
internals <- model$symbol$get.internals()
sym_state <- internals$get.output(which(internals$outputs %in% "RNN_state"))
sym_state_cell <- internals$get.output(which(internals$outputs %in% "RNN_state_cell"))
sym_output <- internals$get.output(which(internals$outputs %in% "loss_output"))
symbol <- mx.symbol.Group(sym_output, sym_state, sym_state_cell)

Use the following code to provide the seed character to start the text with:

infer_raw <- c("e")
infer_split <- dic[strsplit(infer_raw, '') %>% unlist]
infer_length <- length(infer_split)
infer.data <- mx.io.arrayiter(data = matrix(infer_split), label = matrix(infer_split), batch.size = 1, shuffle = FALSE)
infer <- mx.infer.rnn.one(infer.data = infer.data,
symbol = symbol,
arg.params = model$arg.params,
aux.params = model$aux.params,
input.params = NULL,
ctx = devices)
pred_prob <- as.numeric(as.array(mx.nd.slice.axis(infer$loss_output, axis = 0, begin = infer_length-1, end = infer_length)))
pred <- sample(length(pred_prob), prob = pred_prob, size = 1) - 1
predict <- c(predict, pred)
for (i in 1:200) {
infer.data <- mx.io.arrayiter(data = as.matrix(pred), label = as.matrix(pred), batch.size = 1,
shuffle = FALSE)
infer <- mx.infer.rnn.one(infer.data = infer.data,
symbol = symbol,
arg.params = model$arg.params,
aux.params = model$aux.params,
input.params = list(rnn.state = infer[[2]],
rnn.state.cell = infer[[3]]),
ctx = devices)
pred_prob <- as.numeric(as.array(infer$loss_output))
pred <- sample(length(pred_prob), prob = pred_prob, size = 1, replace = T) - 1
predict <- c(predict, pred)
}

Use the following lines of code to print the predicted text, after processing the predicted characters and merging them together into one sentence:

predict_txt <- paste0(rev_dic[as.character(predict)], collapse = "")
predict_txt_tot <- paste0(infer_raw, predict_txt, collapse = "")
# printing the predicted text
print(predict_txt_tot)

This will give the following output:

[1] "eNAHare I eat and in Heather where and fingo I ve next feeling or fancy to livery dust a large pived as a pockethion What isual child for of cigstening to get in a strutching voice into saying she got reaAlice glared in a Grottle got to sea-paticular and when she heard it would heard of having they began whrink bark of Hearnd again said feeting and there was going to herself up it Then does so small be THESE said Alice going my dear her before she walked at all can t make with the players and said the Dormouse sir your mak if she said to guesss I hadn t some of the crowd and one arches how come one mer really of a gomoice and the loots at encand something of one eyes purried asked to leave at she had Turtle might I d interesting tone hurry of the game the Mouse of puppled it They much put eagerly"

We see from the output that our RNN is able to autogenerate text. Of course, the generated text is not very cohesive and it needs some improvement. There are several techniques we could rely upon to improve the cohesion and generate more meaningful text from an RNN. The following are some of these techniques:

  • Implement a word-level language model instead of a character-level language model.
  • Use a larger RNN network.
  • In our project, we used LTSM cells to build our RNN. Instead of LSTM cells, we could use GRU cells, which are more advanced.
  • We ran our RNN training for 20 iterations; this may be too little to get the right weights in place. We could try increasing the number of iterations and verifying the RNN yields better predictions.
  • The current model used a dropout of 20%. This can be altered to check the effect on the overall predictions.
  • Our corpus retained very little punctuation; therefore, our model did not have the ability to predict punctuation as characters while generating text. Including punctuation in the corpus on which an RNN gets trained may yield better sentences and word endings.
  • The seq.ln parameter decides the number of characters that need to be looked up in the history, prior to predicting the next character. In our model, we have set this as 100. This may be altered to check whether the model produces better words and sentences.

Due to space and time constraints, we are not going to be trying these options in this chapter. One or more of these options may be experimented with by interested readers to produce better words and sentences using a character RNN.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset