The plot()
function is one of most powerful functions in base R. The main point of using the plot()
function is that it will always try to print out a representation of your data. It basically tries to figure out which kind of representation is the best, based on the data type. This will let you easily and quickly get a first view of the data you are working with.
Behind the scenes, the power of the plot()
function comes from being packed with a number of methods developed for specific types of object.
So, when an object is passed as an argument to plot()
, it looks for the most appropriate method within the ones available and uses it to represent data stored within the object.
It is even possible to further expand the plot()
function, as is regularly done in various packages, adding new methods for specific types of object by running setMethod()
on it. This is out of the scope of this recipe, but you can find a good explanation in the R language documentation at https://stat.ethz.ch/R-manual/R-devel/library/methods/html/setMethod.html.
Just like all other recipes in this chapter, we will use the iris dataset as a sample dataset. You will find this dataset in every installation of the R environment.
The iris dataset is one of most used datasets in R tutorials and learning sessions, and is derived from a 1936 paper The use of multiple measurements in taxonomic problems, by Ronald Fisher.
50 samples of 3 species of the iris flower were observed:
For each sample, four features were recorded:
To get an idea of this dataset, you can take a look at its structure by running the following code:
str(iris)
Well, to be honest, if you arrived here after walking through Chapter 2, Preparing for Analysis – Data Cleansing and Manipulation, understanding the structure of your data shouldn't be a problem for you. Nevertheless, you can always skip back to that chapter, specifically to the Getting a sense of your data structure with R recipe.
plot()
function:plot(iris)
This will result into the following plot:
The preceding plot shows all variables against one other, for instance, the second rectangle in the first row from the top shows Sepal.Length against Sepal.Width while the third shows Sepal.Length against Petal.Length.
As you may have probably noted, the plot makes it easier to spot the presence, or absence, of any relationship between variables.
Among the attributes recorded for each observation, you can easily select a specific attribute by running the following code:
plot(iris$Sepal.Width)
The resulting plot shows on the x axis the row index of a given observation, which, as per the data frame dimensions, ranges from 0 to 150. On the y axis, you will find the value of the particular attribute y.
Select which is, in our example, Sepa.Width column.
You can change the type of plot produced by the plot()
function by changing the value of the type
argument.
Let's try with the value 0
, which stands for points and lines overplotted:
plot(iris$Sepal.Width,type = "h")
You can now focus on different variables or even plot two plots against each other using the following script:
plot(x = iris$petal.length, y = iris$petal.width)
As discussed in the introduction to this recipe, the plot function basically looks at the data type of plotting object and subsequently chooses an appropriate way to display it.
For instance in step 1, if plotting a simple vector, the plot()
function will plot it against the vector indexes:
x <- c(1,2,3) plot(x)
If we plot the same vector together with another vector, the two of them stored in a dataframe will instead result in a scatter plot with the two vectors on the axes:
x <- c(1,2,3) y <- c(4,6,8) data <- data.frame(x,y) plot(data)
Consequently, passing a data frame composed of different n attributes and therefore n columns to the plot()
function will result in a matrix of n x n columns where each attribute is plotted against the other, as seen in the recipe.
By now, you know what step 2 is all about. Plotting iris$sepal.length
alone results in a plot where x axis is represented by row indexes of the iris dataset the, while the sepal.length
values are represented on the y axis.
Besides great flexibility on the input side, in step 3, the plot()
function is characterized by a good number of possible choices for the output.
In particular, changing the value of the type
argument makes it possible to change the type of data visualization that the plot will produce.
You can choose from among the following possibilities:
All these types are available for numerical variables, while more attention has to be paid to categorical attributes.
plot()
function by specifying a type
argument will always result in a histogram representing the number of occurrences of each possible value assumed by the attribute: