Our second look at scatterplots will revolve around customizing data point markers, adding new information to a plot, and creating best fit lines:
pch
and cex
arguments:> #modify the chapter 8 single scatterplot that depicted the relationship between the number of Shu and Wei soldiers engaged in past fire attacks > #use the pch argument to change the style of the data point markers > #pch accepts a whole number value between 0 and 25 > scatterplotFireSoldiersPch <- 2 > #use the cex argument to change the size of the data point markers > #cex accepts a numeric value indicating by how much to scale the markers > #cex defaults to value of 1 > scatterplotFireSoldiersCex <- 3 > plot(x = scatterplotFireWeiSoldiersData, y = scatterplotFireShuSoldiersData, main = scatterplotFireSoldiersLabelMain, xlab = scatterplotFireSoldiersLabelX, ylab = scatterplotFireSoldiersLabelY, pch = scatterplotFireSoldiersPch, cex = scatterplotFireSoldiersCex)
> #prepare the line chart to incorporate data from the other battle methods > #modify the chart title > scatterplotAllMethodsSoldiersMain <- "Soldiers Engaged by Battle Method" > #rescale the axes to handle the new data > scatterplotAllMethodsSoldiersLimX <- c(0, 200000) > scatterplotAllMethodsSoldiersLimY <- c(0, 150000) > #incorporate the col argument to distinguish between the different battle methods > scatterplotAllMethodsSoldiersFireCol <- "red" > #use plot(...) to create and display the revised line chart > plot(x = scatterplotFireWeiSoldiersData, y = scatterplotFireShuSoldiersData, main = scatterplotAllMethodsSoldiersMain, xlab = scatterplotFireSoldiersLabelX, ylab = scatterplotFireSoldiersLabelY, xlim = scatterplotAllMethodsSoldiersLimX, ylim = scatterplotAllMethodsSoldiersLimY, col = scatterplotAllMethodsSoldiersFireCol, pch = scatterplotFireSoldiersPch, cex = scatterplotFireSoldiersCex)
points(...)
function to add new relationships to the scatterplot:> #use points(...) to add new relationships to a scatterplot > #add points representing the three remaining battle methods > #note that after entering each subsequent function into the R console, it will be immediately drawn atop your existing scatterplot > #ambush > pointsAmbushDataX <- subsetAmbush$WeiSoldiersEngaged > pointsAmbushDataY <- subsetAmbush$ShuSoldiersEngaged > pointsAmbushType <- "p" > pointsAmbushPch <- 1 > pointsAmbushCex <- 1 > pointsAmbushCol <- "blue" > points(x = pointsAmbushDataX, y = pointsAmbushDataY, type = pointsAmbushType, col = pointsAmbushCol, pch = pointsAmbushPch, cex = pointsAmbushCex) > #head to head > pointsHeadToHeadDataX <- subsetHeadToHead$WeiSoldiersEngaged > pointsHeadToHeadDataY <- subsetHeadToHead$ShuSoldiersEngaged > pointsHeadToHeadType <- "p" > pointsHeadToHeadPch <- 3 > pointsHeadToHeadCex <- 1 > pointsHeadToHeadCol <- "darkorange2" > points(x = pointsHeadToHeadDataX, y = pointsHeadToHeadDataY, type = pointsHeadToHeadType, col = pointsHeadToHeadCol, pch = pointsHeadToHeadPch, cex = pointsHeadToHeadCex) > #surround > pointsSurroundDataX <- subsetSurround$WeiSoldiersEngaged > pointsSurroundDataY <- subsetSurround$ShuSoldiersEngaged > pointsSurroundType <- "p" > pointsSurroundPch <- 4 > pointsSurroundCex <- 1 > pointsSurroundCol <- "forestgreen" > points(x = pointsSurroundDataX, y = pointsSurroundDataY, type = pointsSurroundType, col = pointsSurroundCol, pch = pointsSurroundPch, cex = pointsSurroundCex)
> #add a legend > #use the x and y arguments to specify the exact location of the legend > #add labels for the battle methods > #add fill colors to match the scatterplot's points > legend(x = 145000, y = 65000, legend = c("Fire", "Ambush", "Head to Head", "Surround"), fill = c("red", "blue", "darkorange2", "forestgreen"))
abline(...)
function to add a best fit line to each relationship in the scatterplot.> #add a best fit line using abline(...) > #the reg argument represents a regression equation > #reg is defined using the lm(...) function > #the lty argument defines the style of line to be used > #as with other graphic functions, the col argument defines a color for the line > #note that after entering each subsequent function into the R console, it will be immediately drawn atop your existing scatterplot > #fire > scatterplotAllMethodsSoldiersFireLineReg <- lm(scatterplotFireShuSoldiersData ~ scatterplotFireWeiSoldiersData) > scatterplotAllMethodsSoldiersFireLty <- "solid" > #abline(...) will draw a best fit line atop a preexisting plot > abline(reg = scatterplotAllMethodsSoldiersFireLineReg, lty = scatterplotAllMethodsSoldiersFireLty, col = scatterplotAllMethodsSoldiersFireCol) > #ambush abline(...) functionusing> scatterplotAllMethodsSoldiersAmbushLineReg <- lm(pointsAmbushDataY ~ pointsAmbushDataX) > scatterplotAllMethodsSoldiersAmbushLty <- "dotted" > #abline(...) will draw a best fit line atop a preexisting plot > abline(reg = scatterplotAllMethodsSoldiersAmbushLineReg, lty = scatterplotAllMethodsSoldiersAmbushLty, col = pointsAmbushCol) > #head to head > scatterplotAllMethodsSoldiersHeadToHeadLineReg <- lm(pointsHeadToHeadDataY ~ pointsHeadToHeadDataX) > scatterplotAllMethodsSoldiersHeadToHeadLty <- "dotdash" > #abline(...) will draw a best fit line atop a preexisting plot > abline(reg = scatterplotAllMethodsSoldiersHeadToHeadLineReg, lty = scatterplotAllMethodsSoldiersHeadToHeadLty, col = pointsHeadToHeadCol) > #surround > scatterplotAllMethodsSoldiersSurroundLineReg <- lm(pointsSurroundDataY ~ pointsSurroundDataX) > scatterplotAllMethodsSoldiersSurroundLty <- "dashed" > #abline(...) will draw a best fit line atop a preexisting plot > abline(reg = scatterplotAllMethodsSoldiersSurroundLineReg, lty = scatterplotAllMethodsSoldiersSurroundLty, col = pointsSurroundCol)
We customized our scatterplot's point markers, then expanded it to include additional data, before adding best fit lines to our graphic. Let us examine these items in greater detail.
We customized the data point markers in our fire attack scatterplot using the plot(...)
function's pch
and cex
arguments. These are defined as follows:
In our case, we used pch
with the value 2
to apply triangle markers to our data points and then scaled them by three times with cex
equal to 3:
> scatterplotFireSoldiersPch <- 2 > scatterplotFireSoldiersCex <- 3
Thus, we arrived at a plot with large, triangular point markers:
> plot(x = scatterplotFireWeiSoldiersData, y = scatterplotFireShuSoldiersData, main = scatterplotFireSoldiersLabelMain, xlab = scatterplotFireSoldiersLabelX, ylab = scatterplotFireSoldiersLabelY, pch = scatterplotFireSoldiersPch, cex = scatterplotFireSoldiersCex)
The primary purpose of the pch
and cex
arguments is to improve the visual aspects of scatterplots. In tandem, these arguments can generate a wide array of potential data point markers.
You can see a complete list of the markers available for use in the pch
argument by plotting them with plot(0:25, pch = 0:25)
.
To add new relationships to our scatterplot, we executed the points(...)
function. This function incorporates additional data points into a plot that is displayed in the graphic window. The primary arguments of the points(...)
function are:
Thus, the general format for the points(...)
function is as follows:
points(x = xPosition, y = yPosition, type = "type", col = "colorName")
In tandem with these, we also used the pch
and cex
arguments in our points(...)
functions to customize the style and size of our data markers. The x
and y
arguments featured the Wei and Shu soldier data for each method:
> #ambush > pointsAmbushDataX <- subsetAmbush$WeiSoldiersEngaged > pointsAmbushDataY <- subsetAmbush$ShuSoldiersEngaged > pointsAmbushType <- "p" > pointsAmbushPch <- 1 > pointsAmbushCex <- 1 > pointsAmbushCol <- "blue" > #head to head > pointsHeadToHeadDataX <- subsetHeadToHead$WeiSoldiersEngaged > pointsHeadToHeadDataY <- subsetHeadToHead$ShuSoldiersEngaged > pointsHeadToHeadType <- "p" > pointsHeadToHeadPch <- 3 > pointsHeadToHeadCex <- 1 > pointsHeadToHeadCol <- "darkorange2" > #surround > pointsSurroundDataX <- subsetSurround$WeiSoldiersEngaged > pointsSurroundDataY <- subsetSurround$ShuSoldiersEngaged > pointsSurroundType <- "p" > pointsSurroundPch <- 4 > pointsSurroundCex <- 1 > pointsSurroundCol <- "forestgreen"
After beginning our scatterplot with fire attack data, we used points(...)
to plot the soldier data for our ambush, head to head, and surround methods:
> #ambush > points(x = pointsAmbushDataX, y = pointsAmbushDataY, type = pointsAmbushType, col = pointsAmbushCol, pch = pointsAmbushPch, cex = pointsAmbushCex) > #head to head > points(x = pointsHeadToHeadDataX, y = pointsHeadToHeadDataY, type = pointsHeadToHeadType, col = pointsHeadToHeadCol, pch = pointsHeadToHeadPch, cex = pointsHeadToHeadCex) > #surround > points(x = pointsSurroundDataX, y = pointsSurroundDataY, type = pointsSurroundType, col = pointsSurroundCol, pch = pointsSurroundPch, cex = pointsSurroundCex)
Note that we also redefined the x-axis and y-axis scales with xlim
and ylim
prior to adding our new points. This allowed all of our values to display within the bounds of our chart. If we did not rescale the axes, most of our points would fall outside the upper limit of our graph, because the fire attack soldier values are much smaller than in our other battle methods.
We used our familiar legend(...)
function to add a key that identified the points from each of our battle method datasets. Its title and colors were matched to those of the points in our scatterplot:
> legend(x = 145000, y = 65000, legend = c("Fire", "Ambush", "Head to Head", "Surround"), fill = c("red", "blue", "darkorange2", "forestgreen"))
After completing our scatterplot setup, we added best fit lines. Also known as a regression line, a best fit line expresses the relationship in a scatterplot as a single, straight line. To accomplish this, the line attempts to orient itself as close as possible to all of the data points. The result is a line that approximates a linear relationship between the variables. In R, we can use the abline(...)
function to add a best fit line to an existing graphic. In addition to the col
argument, which we already know about, the primary arguments for abline(...)
are:
The basic structure of the abline(...)
function is as follows:
abline(reg = lm(y ~ x), lty = "lineType")
In our abline(...)
functions, we used lty
to define unique line types for each of our battle methods. We also matched our lines' colors to those of our scatterplot's points. Our reg
arguments used the lm(...)
function to specify the number of Shu soldiers as our y
variable and the number of Wei soldiers as our x
variable:
> #fire abline(...) functionsyntax> scatterplotAllMethodsSoldiersFireLineReg <- lm(scatterplotFireShuSoldiersData ~ scatterplotFireWeiSoldiersData) > scatterplotAllMethodsSoldiersFireLty <- "solid" > #ambush > scatterplotAllMethodsSoldiersAmbushLineReg <- lm(pointsAmbushDataY ~ pointsAmbushDataX) > scatterplotAllMethodsSoldiersAmbushLty <- "dotted" > #head to head > scatterplotAllMethodsSoldiersHeadToHeadLineReg <- lm(pointsHeadToHeadDataY ~ pointsHeadToHeadDataX) > scatterplotAllMethodsSoldiersHeadToHeadLty <- "dotdash" > #surround > scatterplotAllMethodsSoldiersSurroundLineReg <- lm(pointsSurroundDataY ~ pointsSurroundDataX) > scatterplotAllMethodsSoldiersSurroundLty <- "dashed"
The complete abline(...)
functions incorporated our reg, lty
, and col
arguments to draw best fit lines for our battle method data:
> #fire > abline(reg = scatterplotAllMethodsSoldiersFireLineReg, lty = scatterplotAllMethodsSoldiersFireLty, col = scatterplotAllMethodsSoldiersFireCol) > #ambush > abline(reg = scatterplotAllMethodsSoldiersAmbushLineReg, lty = scatterplotAllMethodsSoldiersAmbushLty, col = pointsAmbushCol) > #head to head > abline(reg = scatterplotAllMethodsSoldiersHeadToHeadLineReg, lty = scatterplotAllMethodsSoldiersHeadToHeadLty, col = pointsHeadToHeadCol) > #surround > abline(reg = scatterplotAllMethodsSoldiersSurroundLineReg, lty = scatterplotAllMethodsSoldiersSurroundLty, col = pointsSurroundCol)
A best fit line is useful in gauging whether or not the relationship between two variables is indeed linear. Therefore, it is beneficial to apply when exploring a new dataset. We can also use best fit lines to compare the relationships between related datasets.
In our plot, it is quite clear that the relationship between the numbers of Shu and Wei soldiers engaged is different for different battle methods. For instance, the best fit lines help us to see that in the surround method, the number of Shu soldiers tends to be relatively high compared to the number of Wei soldiers. In contrast, with the fire attack method, the number of Wei soldiers tends to be relatively high compared to the number of Shu soldiers. Using a scatterplot such as this one, along with one or more best fit lines, is still another way to inform our interpretations and understanding of the relationships between our variables. Moreover, using a graphic often helps us to discover things that we cannot see in the raw data alone.
plot(...)
function, what is the relationship between the pch
and cex
arguments?a. pch
sets the type of data point marker, while cex
sets the size of the marker.
b. cex
sets the type of data point marker, while pch
sets the size of the marker.
c. pch
sets the number of data point markers, while cex
sets the style of the markers.
d. cex
sets the number of data point markers, while pch
sets the style of the markers.
a. They help us to understand the relationship between the variables.
b. They inform our interpretation of the relationship between the variables.
c. They tell us whether the variables will have an interaction effect.
d. They indicate the linearity of the relationship between the variables.
Create a scatterplot that depicts the relationship between the execution and rating of past fire attacks. Be sure to use the numeric version of the successful execution variable. Note that since execution is dichotomous (containing only two possible values), the resulting plot will look different from the ones we created with our soldier data. Try to interpret the meaning of this graphic. Does it make sense to add a best fit line in this situation?
Now use the sunflowerplot(...)
function with the same arguments that you just used in the plot(...)
argument. Try to interpret the meaning of this graphic. Refer back to the raw fire data for help recalling the data contained in the Rating
and SuccessfullyExecuted
variables.
Consider the graphics generated by your plot(...)
and sunflowerplot(...)
functions. How do these functions differ in the way they portray data?