In this section, we will practice customizing the bars of a histogram and create an alternative style of histogram:
breaks
argument to separate the histogram's columns along the x-axis:> #modify the chapter 8 histogram that depicted the frequency distribution of past fire attack durations > #use the breaks argument to divide the histogram's columns along the x axis > #breaks accepts a vector containing the points at which columns should occur > histFireDurationBreaks <- c(0:14) > #use hist(...) to create and display the histogram > hist(x = histFireDurationData, main = histFireDurationDataMain, xlab = histFireDurationLabX, col = histFireDurationRainbowColor, breaks = histFireDurationBreaks)
freq
argument to plot densities instead of counts:> #use the freq argument to plot densities or counts > #if freq is TRUE (default), counts are graphed on the y axis > #a count tells us the number of times that a data point occurred > #if freq is FALSE, densities are graphed on the y axis > #a density tells us what percentage a data point's count represents out of all occurrences > #when summed, the densities always add up to 1 > histFireDurationFreq <- FALSE > #remember to modify the ylim argument, as our previous one applied to counts and not to densities > histFireDurationDensityLimY <- c(0, 0.2) > #use hist(...) to create and display the histogram > hist(x = histFireDurationData, main = histFireDurationDataMain, xlab = histFireDurationLabX, ylim = histFireDurationDensityLimY, col = histFireDurationRainbowColor, breaks = histFireDurationBreaks, freq = histFireDurationFreq)
We set the breaks
argument to add detail to our histogram, then defined the freq
argument to change the display of our graphic. Let us discuss each of these actions.
The breaks
argument is used to define where a histogram's columns are separated along the x-axis. This argument receives a vector containing the points at which the column divisions should occur. Within the hist(...)
function, employing the breaks argument may resemble using the xlim
argument in other graphics. However, while xlim
rescales the x-axis of a histogram, it does not modify its columns. Therefore, the breaks
argument is necessary when we want to define the exact points at which our columns should occur.
By default, R provided us with seven bars that spanned a width of two days each. With number-colon-number notation (0:14) and the breaks
argument, we created 14 columns that spanned 1 day each:
histFireDurationBreaks <- c(0:14)
This had the effect of increasing the interpretability and detail of our histogram:
hist(x = histFireDurationData, main = histFireDurationDataMain, xlab = histFireDurationLabX, col = histFireDurationRainbowColor, breaks = histFireDurationBreaks)
The freq
argument allows us to toggle our histogram between displaying counts (or frequencies) and densities (or percentages). A count indicates how many times a value occurs within a dataset. A density indicates the percentage that the count of a value makes up in the entire dataset.
For instance, in the vector c(1, 1, 1, 3, 5)
, the number 1 has a count of 3 because it occurs 3 times. The number 1 has a a density of 0.6 (or 60%) because its count of 3 makes up 3/5 of the overall dataset.
By default, freq
is set to TRUE
and displays counts. If it is set to FALSE
, then densities will be graphed instead. The sum of the densities in a histogram will always equal 1, which represents 100% of the dataset.
We modified our original histogram to display densities by setting the the freq
argument to FALSE:
histFireDurationFreq <- FALSE
Note that we also adjusted our ylim
argument to appropriately display our density values:
histFireDurationDensityLimY <- c(0, 0.2)
These alterations allowed us to visualize our battle durations as percentages rather than counts:
hist(x = histFireDurationData, main = histFireDurationDataMain, xlab = histFireDurationLabX, ylim = histFireDurationDensityLimY, col = histFireDurationRainbowColor, breaks = histFireDurationBreaks, freq = histFireDurationFreq)
hist(...)
, what is the relationship between the xlim
and breaks
arguments?a. breaks
sets the overall scale of the x-axis, whereas xlim
divides the histogram's columns along the x-axis.
b. xlim
sets the overall scale of the x-axis, whereas breaks
divides the histogram's columns along the x-axis
c. breaks
replaces the xlim
argument when creating a histogram.
d. xlim
replaces the breaks
argument when creating a histogram.
a. A count is the number of times that a value occurs in a dataset, whereas a density is the total count of all values in a dataset.
b. A density is the number of times that a value occurs in a dataset, whereas a count is the total count of all values in a dataset.
c. A count is the number of times that a value occurs in a dataset, whereas a density is the percentage of the dataset that a value accounts for.
d. A density is the number of times that a value occurs in a dataset, whereas a count is the percentage of the dataset that a value accounts for.
Create a histogram that conveys the number of Shu soldiers engaged in past fire attacks. Improve its readability by incorporating the breaks
argument into your hist(...)
function. Then, create a density version of the histogram using the freq
argument. Compare your frequency and density histograms. Which do you feel is better for displaying this particular data?