How it works…

Let's take a look at the preceding process sequentially:

  1. The initial phase was doing the PubMed search through the pubmed.mineR library.
  2. The search keyword is converted into the PubMed search_query term using the EUtilsSummary() function.
  3. Once the search_query term has been created, it parses through EUtilsGet() to get the actual search result from PubMed. The search results are extracted into an object.
  4. Later on, the abstract text has been retrieved, and a vector has been created.
  5. Once the vector of the text data has been created, the pre-processing step began from here. Using the tm library, you have created the corpus of the abstract by giving the vector input by the following code line:
        AbstractCorpus <- Corpus(VectorSource(abstracts))
  1. After creating the corpus, you are ready to apply other functions from the tm library to do further processing such as converting the text to lowercase or uppercase, removing numbers, removing punctuation, removing stop words, and stemming the document. The tm_map() function has intuitive options to perform all of these tasks. For example, to remove numbers, use the following code:
        AbstractCorpus <- tm_map(AbstractCorpus, removeNumbers)
  1. After doing all the necessary pre-processing, the final task is to create a term document matrix. This term document matrix is then used in topic modeling and sentiment analysis:
        > trmDocMat
        <<TermDocumentMatrix (terms: 1922, documents: 50)>>
        Non-/sparse entries: 4500/91600
        Sparsity           : 95%
        Maximal term length: 28
        Weighting          : term frequency (tf)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset