Suppose you have a corpus of documents and your objective is to find the frequent words in the corpus. So, the first thing is to do the pre-processing and then create term a document matrix. In this recipe, you will use a regular expression on the text data retrieved from a web page using the readLines() function. Specifically, you will read the following web page using the readLines() function:
https://en.wikipedia.org/wiki/Programming_with_Big_Data_in_R