Setting bin size and number of breaks

As we saw in the previous recipe, the hist() function automatically computes the number or breaks and size of bins in which to group the values of the variable. In this recipe, we will learn how we can control that and specify exactly how many bins we want or where to have breaks between bars.

Getting ready

Once again, we will use the airpollution.csv example dataset, so make sure you have loaded it:

air<-read.csv("airpollution.csv")

How to do it...

First, let's see how to specify the number of breaks. Let's make 20 breaks in the Nitrogen Oxides histogram instead of the default 11:

hist(air$Nitrogen.Oxides,
breaks=20,xlab="Nitrogen Oxide Concentrations",
main="Distribution of Nitrogen Oxide Concentrations")
How to do it...

How it works...

We used the breaks argument to specify the number of bars for the histogram. We set breaks to 20, however the graph shows more than 20 bars because R uses the value specified only as a suggestion and computes the best way to bin the data with breaks as close to the value specified as possible.

There's more...

We can also specify the exact values at which we want the breaks to occur. In this case, R does use the value we specify. Once again we use the breaks argument but this time we have to set it to a numerical vector containing the values at which we want the breaks. The breaks vector must cover the full range of values of the X variable.

Let's say we want breaks at every 100 units of concentration:

hist(air$Nitrogen.Oxides,
breaks=c(0,100,200,300,400,500,600),
xlab="Nitrogen Oxide Concentrations",
main="Distribution of Nitrogen Oxide Concentrations")
There's more...

So, as you may have noticed, the breaks argument can take different types of values: a single value suggesting the number of breaks or a vector specifying exact bin breaks. In addition, breaks can also take a function which computes the number of bins.

Finally, breaks can also take a character string as value naming an algorithm to calculate the number of bins. By default, it is set to "Sturges". Other names for which algorithms are supplied are "Scott" and "FD" or "Freedman-Diaconis".

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset