How to do it…

Let's take a look at the following steps to import plain text data from a PDF file:

  1. Since you will read multiple PDF files, it is good to create an object containing all filenames. You can do this either by manually creating the object of filenames, or you can automatically read the filenames that have the PDF extension. Here is the code to automatically read the filenames:
        pdfFileNames <- list.files(pattern = "pdf$")
  1. Before running the preceding line, make sure that you have set your working directory using the setwd() function.
  2. Once you have the list of filenames, you need to load the pdftools library into the R environment as follows:
        library(pdftools)
  1. Now you are ready to read the text data from the PDF file. Run the following code to get the text from all three PDF files:
        txt <- sapply(pdfFileNames, pdf_text)

The newly created object txt contains a named character vector of the text imported from the PDF files. Here, the spply() function has been used to parse all PDF files into a single line of code.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset