A histogram displays the frequency with which certain values occur in a dataset. Visually, a histogram looks similar to a bar chart, but it conveys different information. Histograms help us to get an idea of how varied and distributed our data are. Let us begin the histogram making process in R:
hist(...)
function to create a histogram:> #create a histogram that depicts the frequency distribution of past fire attack durations > #get the histogram data > histFireDurationData <- subsetFire$DurationInDays > #customize the histogram > histFireDurationDataMain <- "Duration of Past Fire Attacks" > histFireDurationLabX <- "Duration in Days" > histFireDurationLimY <- c(0, 10) > histFireDurationRainbowColor <- rainbow(max(histFireDurationData)) > #use hist(...) to create and display the histogram > hist(x = histFireDurationData, main = histFireDurationDataMain, xlab = histFireDurationLabX, ylim = histFireDurationLimY, col = histFireDurationRainbowColor)
We used the hist(...)
function to generate a histogram that depicted the frequency distribution of our fire attack duration data.
In its simplest form, the hist(...)
function is very similar to boxplot(...)
. At a minimum, it requires only that the data for the chart's columns be defined. A simple function looks like the following:
hist(x = dataset)
As is true with our other graphics, the hist(...)
function also receives graphic customization arguments. We rescaled our y-axis with ylim
, colored our bars with col
, and added text to our histogram with main
and xlab
. Also note that we used the max(data)
function within the rainbow(...)
component of our col
argument to ensure that our histogram would have enough colors to represent each unique value in our dataset:
hist(x = histFireDurationData, main = histFireDurationDataMain, xlab = histFireDurationLabX, ylim = histFireDurationLimY, col = histFireDurationRainbowColor)
a. The most and least frequently occurring values in the dataset.
b. The total number of data points in the dataset.
c. The minimum and maximum values in the dataset.
d. The exact value of each data point in the dataset.