How it works…

This is the simplest approach to reading text data from a web page. The steps are very simple and intuitive to understand. The url() function creates a connection link between the web page and R session, and then it reads the text line by line through the readLines() function. Since the code reads the HTML source code line by line, the resultant object is a vector of character. Each line contains the HTML source code from the original HTML web page.

Though the output object contains HTML code, you are not able to do further processing assuming an HTML structure. The output object is a completely unstructured text vector. Here is the output of the first few lines:

From the preceding output, it is clear that to do any further analysis, there is a need to do pre-processing, which is a time-consuming task.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset