Getting ready

Suppose you have a corpus of documents and your objective is to find the frequent words in the corpus. So, the first thing is to do the pre-processing and then create term a document matrix. In this recipe, you will use a regular expression on the text data retrieved from a web page using the readLines() function. Specifically, you will read the following web page using the readLines() function:

https://en.wikipedia.org/wiki/Programming_with_Big_Data_in_R

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset