Bar charts

Bar charts are usually used to explore how one (or more) categorical variables are distributed. In qplot(), this is done using the geom option bar. This geometry counts the number of occurrences of each factor variable, which appears in the data. To show an example of the bar chart, we will use the movies dataset, which is included within the ggplot2 package. We have already seen how to recall the dataset included with the basic installation of R, but if you are interested in the list of datasets within a specific package (ggplot2 in this case), you can use the following code:

require(ggplot2)       ## Load ggplot2 if needed
data(package="ggplot2")  ## List of dataset within ggplot2

The movies dataset contains information about movies, including the rating, from the http://imdb.com/ website. You can get a more detailed description in the help page of the dataset.

This dataset contains different variables but, for our example, we will not need all of them, so let´s rearrange a bit of its content. For our exercise, we are first interested in knowing how many movies were produced in each category - Action, Animation, Comedy, Drama, Documentary, and Romance. Let's also keep in the dataset the information about the movie budget, whether it was a short or regular movie, its year, and so on. So, the steps covered in our code are:

  1. Load the data.
  2. Extract from the dataset the information for each movie type concerning budget and length.
  3. Create a factor variable containing the movie type.

The header of our final dataset, called myMovieData, will then be Budget, Short, Year, and Type. So, here's our code:

d1 <-data.frame(movies[movies$Action==1, c("budget", "Short", "year")])
d1$Type <- "Animation"
d2 <-data.frame(movies[movies$Animation==1, c("budget", "Short", "year")])
d2$Type <- "Animation"
d3 <-data.frame(movies[movies$Comedy==1, c("budget", "Short", "year")])
d3$Type <- "Comedy"
d4 <-data.frame(movies[movies$Drama==1, c("budget", "Short", "year")])
d4$Type <- "Drama"
d5 <-data.frame(movies[movies$Documentary==1, c("budget", "Short", "year")])
d5$Type <- "Documentary"
d6 <-data.frame(movies[movies$Romance==1, c("budget", "Short", "year")])
d6$Type <- "Romance"
myMovieData <- rbind(d1, d2, d3, d4, d5, d6)
names(myMovieData) <- c("Budget", "Short", "Year", "Type" )

Now that our data is ready, let's create our first bar chart. In general, we will follow the same structure as the other plots, just replacing the geom specification:

qplot(Type, data=myMovieData , geom="bar", fill=Type)

This standard bar chart will generate bars representing the count of each element (the movie type) for each type available. Since we have also assigned the fill aesthetic attribute to the same type variable, we also obtain the coloring of each bar in a different way. The plot generated is represented in Figure 2.5:

Bar charts

Figure 2.5: This shows a bar chart of the different movie types

In the plot we just created, the bars are colored differently depending on the movie type. However, we can use the fill argument in a more useful way. In fact, we could also require a different color based on the value of a second variable, in this way adding more information to the plot. In our simple example, we can split each bar by the relative amount of a short or regular movie. This is done simply by assigning the Short column to the fill argument as shown in the following code:

qplot(Type, data=myMovieData , geom="bar", fill=factor(Short))

The result is shown in Figure 2.6. As illustrated, we can now see the movie counts for short and regular movies, summing up the total number of movies for each type.

Bar charts

Figure 2.6: This shows a bar chart of the different movie types with filling split by movie length

As you probably noticed in this last example, we assigned the Short variable to the fill argument, but in the assignment, we also converted the variable to factor, while in the previous example, when we used the Type variable, we did not do so. The reason is that the fill aesthetic attribute, in this case, needed a discrete variable, which defined different levels. These, in turn, were assigned to different colors. The Type variable of the previous example was already a factor, where each level represented the movie type. On the other hand, the Short variable is actually numeric: 0 for regular movies and 1 for short movies. For this reason, we had to convert it first to a factor, so qplot could identify this variable as indicating two levels of a discrete variable. We will also discuss in detail the assignment of discrete and continuous variables in Chapter 4, Advanced Plotting Techniques. You can check out the class of the two columns with the following code:

 > class(myMovieData$Short)
[1] "integer"
> class(myMovieData$Type)
[1] "factor"

One last thing to mention about bar charts is the position argument of the qplot function. Such argument defines the way you would like to display the bars within the chart. The three main options are stack, dodge, and fill. The stack option puts the bars with the same x value on top of each other; the dodge option places the bars next to each other for the same x value; and the fill option places the bars on top of each other but normalizes the height to 1. The following code shows the position adjustment applied to our last example:

qplot(Type, data=myMovieData, geom="bar", fill=factor(Short), position="stack")
qplot(Type, data=myMovieData, geom="bar", fill=factor(Short), position="dodge")
qplot(Type, data=myMovieData, geom="bar", fill=factor(Short), position="fill")

Figure 2.7 shows you the resulting plot for each option:

Bar charts

Figure 2.7: This shows the bar chart of different movie types with filling split by movie length for different displays of bars—stack (A), dodge (B), and fill (C)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset