How to do it…

The whole process involves the following specific tasks:

Specify the search keyword.
Create PubMed search_query from the keyword.
Perform the search and limit it to the first 50 articles.
Extract the abstract texts and store them in an object.

Here is the code to do the preceding task:

        library(pubmed.mineR)
        library(RISmed)
        keyword <- "Deep Learning"
        search_query <- EUtilsSummary(keyword, retmax=50)
        summary(search_query)
        extractedResult <- EUtilsGet(search_query)
        pmid <- PMID(extractedResult)
        years <- YearPubmed(extractedResult)
        Jtitle <- Title(extractedResult)
        articleTitle <- ArticleTitle(extractedResult)
        abstracts <- AbstractText(extractedResult)

Once you have the abstracts in your R session, the next step is to do the pre-processing of the texts. Here are the steps for pre-processing:

Convert all texts to either lowercase or uppercase.
Remove punctuation from the text.
Remove digits from the text.
Remove stop words.
Stemming the words, finding the root of a word or synonyms.
Before implementing the tasks, you should create a corpus of the text data. The whole process is implemented using the function available in the tm library:

        library(tm)
        AbstractCorpus <- Corpus(VectorSource(abstracts))
        AbstractCorpus <- tm_map(AbstractCorpus, content_transformer
        (tolower))
        AbstractCorpus <- tm_map(AbstractCorpus, removePunctuation)
        AbstractCorpus <- tm_map(AbstractCorpus, removeNumbers)
        Stopwords <- c(stopwords('english'))
        AbstractCorpus <- tm_map(AbstractCorpus, removeWords,  
        Stopwords)
        AbstractCorpus <- tm_map(AbstractCorpus, stemDocument)

Once you have done all the initial processing, the final step is to create a term document matrix. This is a big sparse matrix. Whether a word is present in a document or not is indicated by a 1 or 0. To get the term document matrix, run the following code:

        trmDocMat <- TermDocumentMatrix(AbstractCorpus, control =  
        list(minWordLength = 1))

Table of Contents for How to do it…

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it…