In this example, we will use the Jack and Jill nursery rhyme as our source text so that we can build a language model. We'll create a text file with the rhyme in it and save it in the directory. Our language model will take two words as input to predict the next word.
We'll start by importing the required libraries and reading our text file:
library(keras)
library(readr)
library(stringr)
data <- read_file("data/rhyme.txt") %>% str_to_lower()
In NLP, we refer to our data as a corpus. A corpus is a large collection of text. Let's have a look at our corpus:
data
The following screenshot shows the text in our corpus:
We will use the text in the preceding screenshot for sequence generation.