Chapter 6. Using R's Visualization Alternatives in Shiny

Normally, graphics are the central elements of data visualization as they are an intuitive and clear technique to display results, especially on a computer. In this chapter, you will learn about the different graphical procedures in R and how to integrate them in your Shiny applications.

By now, you must have acquired a clear notion of how to structure a Shiny web application and the different types of outputs that can be used. The purpose of this chapter is to complement this with a series of techniques to display data in a graphical way. In this way, unlike other software including commercial ones, R provides you with the possibility of displaying graphics (and any information in general) practically without restriction.

The importance of this goes far beyond a technical question. In alignment with GNU philosophy, no technical restriction implies no restriction in communication, which finally means "free" as in "free speech".

This chapter is particularly focused on the following three packages:

The graphics package

As it was mentioned before, graphics is the most basic graphical package in R. As with any other package, it contains a wide variety of functions (all dedicated to graphics, of course) but plot() is the most important one. plot() is a special type of function called a generic function, which is a function that can receive inputs of different classes but produces different outputs according to the class of the input.

This can be simply appreciated by plotting the different variables of the iris dataset:

Variable type

Plot

If a character or factor vector is passed (such as Species from the iris dataset), a bar graph is returned:

plot(iris$Species)

The graphics package

If a numeric vector is passed, a dispersion graph is returned:

plot(iris$Sepal.Length)

The graphics package

If two numeric vectors are passed, a scatterplot is returned:

plot(iris$Sepal.Length,iris$Sepal.Width)

The graphics package

If a numeric data frame or matrix is passed, a multiple scatterplot is created:

plot(iris)

The graphics package

As you might have already realized, Species has been converted to a discrete numeric variable. This occurs whenever the data frame passed is a mix of numeric and categorical variables. In these cases, plot() transforms factors to numbers.

Multiple other examples could be passed with, naturally, multiple different outputs, such as dates, tables, linear model objects, clustering model objects, among many others.

To sum up, it is always a wise decision to see what plot() returns if the object created is passed to it.

Of course, the default plot() function is not the only one that is available in graphics. This package provides almost every traditional graphic as well. In the later sections, the following types of graphics will be covered:

  • Barplot
  • Histogram
  • Boxplot
  • Pie chart
  • Points
  • Lines

Barplot

barplot() has only one mandatory input, that is, either a numeric vector or a matrix. In the last case, the output is either stacked or placed one next to the other. This is controlled by the besides argument where FALSE returns the first case and TRUE the second.

When a matrix is passed, it has to be taken into account that the function will group by column. This is particularly important while plotting variables of datasets as they are organized by row. So, if two numeric variables of a dataset were selected, stored as matrix, and passed to barplot, then nonsense graphics would be returned as the bars would be stacked by case:

#This plot does not make sense

numeric.subset <- as.matrix(iris[1:5,1:2])

barplot(numeric.subset)

Instead, it is always advisable to use barplots with summarized data that is eventually grouped by another variable. In the case of the iris dataset, we can use barplots to plot the means of Sepal.Length and Sepal.Width (the first two variables in the iris dataset) by species.

The code would look as the following:

#This plot does make sense

#Generate aggregated data

aggregate.info <- aggregate(cbind(Sepal.Length, Sepal.Width) ~ Species, data=iris, mean)

#Select the numeric variables of aggregate.info, convert it to matrix and transpose it

aggregate.info.num <- t(as.matrix(aggregate.info[,c("Sepal.Length","Sepal.Width")]))

#Add column names to new object, which will be each of the Species

colnames(aggregate.info.num) <- aggregate.info$Species

#Finally, plot

barplot(aggregate.info.num,beside=T)

This example shows that sometimes some graphics need prior data processing. This is the reason why data processing was covered earlier in this book. Although R in general has almost no restrictions in graphics, the suitable objects must be passed.

In the first line, the mean per species for both Sepal.Width and Sepal.Length is calculated by the aggregate() function and stored in the aggregate.info object. As barplot makes reasonable plots only with numeric matrices, it is necessary to select the corresponding variables and transform the object that is subset to a matrix. Additionally, as barplot works columnwise, the matrix is transposed. Finally, names are assigned to the columns of the transposed matrix, which will be the names of the species. This is useful to generate the labels and know to which species each bar belongs to.

In the barplot() function, the beside argument is passed as TRUE in order to put them one next to the other (rather than one above another). This is because stacked barplots suggest proportions.

Histograms

Histograms are a graphical method to display a continuous variable in segments. By default, R splits the variable by the Sturges method, which creates log2 from the size of the vector-passed and equally-wide bins.

However, this can be, of course, changed by the user by editing the breaks argument. One of the ways to do this is by passing a numeric vector, as shown in the following example:

hist(iris$Sepal.Length,breaks=c(4,6,8))

Apart from the plot, the hist() function returns a series of other outputs that might be useful. In order to see them, it would be best to assign the hist() output to an object and see what this contains:

> histogram.example <- hist(iris$Sepal.Length,breaks=c(4,6,50))
> names(histogram.example)
[1] "breaks" "counts" "density" "mids" "xname" "equidist"

Boxplots

Boxplots are very popular graphics among data scientists and statisticians. They are very useful as they give complete information about the distribution of the vector in question. Boxplots have two types of mandatory inputs: a numeric vector or a formula where the first variable is continuous, and a character or factor.

In the first case, only one boxplot will be returned while in the second one, a boxplot for each of the different values of the factor or character variable will be displayed. The following is a small example of this:

boxplot(Sepal.Length ~ Species, data=iris)

The output is as follows:

Boxplots

Pie charts

Although not considered for good data visualization, pie charts are very popular among non-expert public (that is, anyone who does not work specifically on data). This is not recommended, mainly because it is much more difficult to perceive differences in size of circular objects than of linear ones. For this reason, data visualization specialists do not advise the use of pie charts. However, as they are widely requested, it can be a necessary tool to have in the toolbox.

The function to draw a pie chart is simply pie(). Its main argument is a numeric vector, indicating quantities. If the elements of this vector have been named, then the chart will display these names. Otherwise, the labels argument can be specified, which is a character vector whose length is equal to the numeric vector passed:

> pie(table(iris$Species))

In this example, a named numeric vector is generated by the table() function, so pie() takes these values and labels to do the plot.

Points

points() is a function that draws points on an already existing plot. This is useful, for example, to draw two plots in one window or to add different sequences. The following is an example:

data(iris)
plot(iris$Sepal.Length)
points(iris$Petal.Length, col="red")

The output is as follows:

Points

Lines

lines() is equivalent to points(), but instead of drawing points for each value, a line is drawn that connects the values in the sequence:

data(iris)
plot(iris$Sepal.Length)
lines(iris$Petal.Length, pch = 18, col="red")
Lines
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset