A useful way to convey a collection of summary statistics in a dataset is through the use of a box plot. This type of graph depicts a dataset's minimum and maximum, as well as its lower, median, and upper quartiles in a single diagram. Let us look at how box plots are created in R:
boxplot(...)
function to create a box plot.> #create a box plot that depicts the number of soldiers required to launch a fire attack > #get the data to be used in the plot > boxplotFireShuSoldiersData <- subsetFire$ShuSoldiers > #customize the plot > boxPlotFireShuSoldiersLabelMain <- "Number of Soldiers Required to Launch a Fire Attack" > boxPlotFireShuSoldiersLabelX <- "Fire Attack Method" > boxPlotFireShuSoldiersLabelY <- "Number of Soldiers" > #use boxplot(...) to create and display the box plot > boxplot(x = boxplotFireShuSoldiersData, main = boxPlotFireShuSoldiersLabelMain, xlab = boxPlotFireShuSoldiersLabelX, ylab = boxPlotFireShuSoldiersLabelY)
boxplot(...)
function to create a box plot that compares multiple datasets.> #create a box plot that compares the number of soldiers required across the battle methods > #get the data formula to be used in the plot > boxplotAllMethodsShuSoldiersData <- battleHistory$ShuSoldiers ~ battleHistory$Method > #customize the plot > boxPlotAllMethodsShuSoldiersLabelMain <- "Number of Soldiers Required by Battle Method" > boxPlotAllMethodsShuSoldiersLabelX <- "Battle Method" > boxPlotAllMethodsShuSoldiersLabelY <- "Number of Soldiers" > #use boxplot(...) to create and display the box plot > boxplot(formula = boxplotAllMethodsShuSoldiersData, main = boxPlotAllMethodsShuSoldiersLabelMain, xlab = boxPlotAllMethodsShuSoldiersLabelX, ylab = boxPlotAllMethodsShuSoldiersLabelY)
We just created two box plots using R's boxplot(...)
function, one with a single box and one with multiple boxes.
We started by generating a single box plot that was composed of a dataset, main title, and x and y labels. The basic format for a single box plot is as follows:
boxplot(x = dataset)
The x
argument contains the data to be plotted. Technically, only x
is required to create a box plot, although you will often include additional arguments. Our boxplot(...)
function used the main, xlab
, and ylab
arguments to display text on the plot, as shown:
> boxplot(x = boxplotFireShuSoldiersData, main = boxPlotFireShuSoldiersLabelMain, xlab = boxPlotFireShuSoldiersLabelX, ylab = boxPlotFireShuSoldiersLabelY)
Next, we created a multiple box plot that compared the number of Shu soldiers deployed by each battle method. The main, xlab
, and ylab
arguments remained from our single box plot, however our multiple box plot used the formula
argument instead of x
. Here, a formula allows us to break a dataset down into separate groups, thus yielding multiple boxes.
The basic format for a multiple box plot is as follows:
boxplot(formula = dataset ~ group)
In our case, we took our entire Shu soldier dataset (battleHistory$ShuSoldiers) and separated it by battle method (battleHistory$Method):
> boxplotAllMethodsShuSoldiersData <- battleHistory$ShuSoldiers ~ battleHistory$Method
Once incorporated into the boxplot(...)
function, this formula resulted in a plot that contained four distinct boxes ambush, fire, head to head, and surround:
> boxplot(formula = boxplotAllMethodsShuSoldiersData, main = boxPlotAllMethodsShuSoldiersLabelMain, xlab = boxPlotAllMethodsShuSoldiersLabelX, ylab = boxPlotAllMethodsShuSoldiersLabelY)
> boxplot(x = a)
a. A single box plot of the a
dataset.
b. A single box plot of the x
dataset.
c. A multiple box plot of the a
dataset that is grouped by x
.
d. A multiple box plot of the x
dataset that is grouped by a
.
> boxplot(formula = a ~ b)
a. A single box plot of the a
dataset.
b. A single box plot of the b
dataset.
c. A multiple box plot of the a
dataset that is grouped by b
.
d. A multiple box plot of the b
dataset that is grouped by a
.