Looking at your data using the plot() function

The plot() function is one of most powerful functions in base R. The main point of using the plot() function is that it will always try to print out a representation of your data. It basically tries to figure out which kind of representation is the best, based on the data type. This will let you easily and quickly get a first view of the data you are working with.

Behind the scenes, the power of the plot() function comes from being packed with a number of methods developed for specific types of object.

So, when an object is passed as an argument to plot(), it looks for the most appropriate method within the ones available and uses it to represent data stored within the object.

It is even possible to further expand the plot() function, as is regularly done in various packages, adding new methods for specific types of object by running setMethod() on it. This is out of the scope of this recipe, but you can find a good explanation in the R language documentation at https://stat.ethz.ch/R-manual/R-devel/library/methods/html/setMethod.html.

Getting ready

Just like all other recipes in this chapter, we will use the iris dataset as a sample dataset. You will find this dataset in every installation of the R environment.

The iris dataset is one of most used datasets in R tutorials and learning sessions, and is derived from a 1936 paper The use of multiple measurements in taxonomic problems, by Ronald Fisher.

50 samples of 3 species of the iris flower were observed:

  • Iris setosa
  • Iris virginica
  • Iris versicolor

For each sample, four features were recorded:

  • Length of the sepals
  • Width of the sepals
  • Length of the petals
  • Width of the petals

To get an idea of this dataset, you can take a look at its structure by running the following code:

str(iris)

Well, to be honest, if you arrived here after walking through Chapter 2, Preparing for Analysis – Data Cleansing and Manipulation, understanding the structure of your data shouldn't be a problem for you. Nevertheless, you can always skip back to that chapter, specifically to the Getting a sense of your data structure with R recipe.

How to do it...

  1. Visualize your data by applying the plot() function:
    plot(iris)

    This will result into the following plot:

    How to do it...

The preceding plot shows all variables against one other, for instance, the second rectangle in the first row from the top shows Sepal.Length against Sepal.Width while the third shows Sepal.Length against Petal.Length.

As you may have probably noted, the plot makes it easier to spot the presence, or absence, of any relationship between variables.

  1. Select a particular attribute to visualize.

    Among the attributes recorded for each observation, you can easily select a specific attribute by running the following code:

    plot(iris$Sepal.Width)
    How to do it...

    The resulting plot shows on the x axis the row index of a given observation, which, as per the data frame dimensions, ranges from 0 to 150. On the y axis, you will find the value of the particular attribute y.

    Select which is, in our example, Sepa.Width column.

  2. Change the plot type.

    You can change the type of plot produced by the plot() function by changing the value of the type argument.

    Let's try with the value 0, which stands for points and lines overplotted:

    plot(iris$Sepal.Width,type = "h")
    How to do it...
  3. Focus on another variable or a couple of variables.

    You can now focus on different variables or even plot two plots against each other using the following script:

    plot(x = iris$petal.length, y = iris$petal.width)

How it works...

As discussed in the introduction to this recipe, the plot function basically looks at the data type of plotting object and subsequently chooses an appropriate way to display it.

For instance in step 1, if plotting a simple vector, the plot() function will plot it against the vector indexes:

x <- c(1,2,3)
plot(x)
How it works...

If we plot the same vector together with another vector, the two of them stored in a dataframe will instead result in a scatter plot with the two vectors on the axes:

x <- c(1,2,3)
y <- c(4,6,8)
data <- data.frame(x,y)
plot(data)

Consequently, passing a data frame composed of different n attributes and therefore n columns to the plot() function will result in a matrix of n x n columns where each attribute is plotted against the other, as seen in the recipe.

By now, you know what step 2 is all about. Plotting iris$sepal.length alone results in a plot where x axis is represented by row indexes of the iris dataset the, while the sepal.length values are represented on the y axis.

Besides great flexibility on the input side, in step 3, the plot() function is characterized by a good number of possible choices for the output.

How it works...

In particular, changing the value of the type argument makes it possible to change the type of data visualization that the plot will produce.

You can choose from among the following possibilities:

  • p for points (when plotting two variables this will result in a scatterplot)
  • l for lines
  • b for both
  • c for the lines part alone of b
  • o for both overplotted
  • h for histogram-like (or high-density) vertical lines
  • s for stair steps
  • S for other steps-refer to details provided later
  • n for no plotting

All these types are available for numerical variables, while more attention has to be paid to categorical attributes.

  • Step 4 explains that plotting a categorical variable using the plot() function by specifying a type argument will always result in a histogram representing the number of occurrences of each possible value assumed by the attribute:
    How it works...
  • Be aware that while plotting two numerical variables will result in a scatter plot, plotting a categorical variable against a numerical one will result in a box plot, depicting the distribution of a given numerical variable within each subgroup defined from the categorical attribute:
How it works...
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset