Word clouds are a nice and useful way to show text composition at a glance.
In a word cloud, words composing a text are composed in a kind of cloud, and usually their size and color is related to the frequency of the given term in the source text.
In this way, it is possible to understand quickly which words are more relevant to the given text. In this recipe, we will explore the Wikipedia page related to the R programming language.
We first need to install the required packages and load them into the R environment:
install.packages(c("wordcloud","RColorBrewer","rvest")) library(wordcloud) library(rvest) library(RColorBrewer)
url <- "https://en.wikipedia.org/wiki/R_(programming_language)" page <- read_html(url) page <- html_text(page,trim = TRUE) page <- gsub(" ","",page, fixed = FALSE) page <- gsub(" ","",page, fixed = FALSE)
wordcloud(page)
wordcloud(page,min.freq = 4)
palette <- brewer.pal(n = 9, "Paired") wordcloud(page,colors = palette)
Our first step involves storing the Wikipedia page URL in an on-purpose variable and downloading the HTML code stored on that page.
HTML reading is done through the read_html()
function from the rvest
package.
After performing this step, we downloaded all the HTML code, including the HTML tags such as <h1>
and <a href>
.
In order to remove these tags and focus on the proper text, we just have to run html_text()
on the created page
object.
We then remove
and
, since they are just escaping characters.
In step 2, we print out our word cloud. Creating our word cloud is as easy as running the wordcloud()
function on the page
object.
Be aware that some default argument values apply here:
text
object. If not specified, the frequency argument would be automatically computed within the function.min.freq
argument is set to 3
by default, meaning that words that appear less than three times will not show up within your word cloud.In step 3, we filter for the most frequent terms. As said earlier, the frequency filter is set through the min.freq
parameter. Changing the parameter will consequently lead to changes in the number of words displayed.
In step 4, we change the words' color combinations. By leveraging Rcolorbrewer
, we can easily define palettes of colors to use to color our word cloud.
To look at the available palettes, just run the following command on your R console:
display.brewer.all()
This function will produce the following plot:
Labels placed on the left-hand side of the plot can be substituted to be "paired" within palette <- brewer.pal(n = 9, "Paired")
.
That said, you should be aware that changing the n argument will change the number of colors retrieved from the given brewer palette for your custom palette.