How to do it…

Let's take a look at the following steps to import plain text data from a PDF file:

Since you will read multiple PDF files, it is good to create an object containing all filenames. You can do this either by manually creating the object of filenames, or you can automatically read the filenames that have the PDF extension. Here is the code to automatically read the filenames:

        pdfFileNames <- list.files(pattern = "pdf$")

Before running the preceding line, make sure that you have set your working directory using the setwd() function.
Once you have the list of filenames, you need to load the pdftools library into the R environment as follows:

        library(pdftools)

Now you are ready to read the text data from the PDF file. Run the following code to get the text from all three PDF files:

        txt <- sapply(pdfFileNames, pdf_text)

The newly created object txt contains a named character vector of the text imported from the PDF files. Here, the spply() function has been used to parse all PDF files into a single line of code.

Table of Contents for How to do it…

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it…