Time for action - customizing a scatterplot

Our second look at scatterplots will revolve around customizing data point markers, adding new information to a plot, and creating best fit lines:

  1. Customize a scatterplot's point markers using the pch and cex arguments:
    > #modify the chapter 8 single scatterplot that depicted the relationship between the number of Shu and Wei soldiers engaged in past fire attacks
    > #use the pch argument to change the style of the data point markers
    > #pch accepts a whole number value between 0 and 25
    > scatterplotFireSoldiersPch <- 2
    > #use the cex argument to change the size of the data point markers
    > #cex accepts a numeric value indicating by how much to scale the markers
    > #cex defaults to value of 1
    > scatterplotFireSoldiersCex <- 3
    > plot(x = scatterplotFireWeiSoldiersData,
    y = scatterplotFireShuSoldiersData,
    main = scatterplotFireSoldiersLabelMain,
    xlab = scatterplotFireSoldiersLabelX,
    ylab = scatterplotFireSoldiersLabelY,
    pch = scatterplotFireSoldiersPch,
    cex = scatterplotFireSoldiersCex)
    
  2. Your plot will be displayed in the graphic window, as shown in the following:
    Time for action - customizing a scatterplot
  3. Prepare the scatterplot to incorporate additional data:
    > #prepare the line chart to incorporate data from the other battle methods
    > #modify the chart title
    > scatterplotAllMethodsSoldiersMain <-
    "Soldiers Engaged by Battle Method"
    > #rescale the axes to handle the new data
    > scatterplotAllMethodsSoldiersLimX <- c(0, 200000)
    > scatterplotAllMethodsSoldiersLimY <- c(0, 150000)
    > #incorporate the col argument to distinguish between the different battle methods
    > scatterplotAllMethodsSoldiersFireCol <- "red"
    > #use plot(...) to create and display the revised line chart
    > plot(x = scatterplotFireWeiSoldiersData,
    y = scatterplotFireShuSoldiersData,
    main = scatterplotAllMethodsSoldiersMain,
    xlab = scatterplotFireSoldiersLabelX,
    ylab = scatterplotFireSoldiersLabelY,
    xlim = scatterplotAllMethodsSoldiersLimX,
    ylim = scatterplotAllMethodsSoldiersLimY,
    col = scatterplotAllMethodsSoldiersFireCol,
    pch = scatterplotFireSoldiersPch,
    cex = scatterplotFireSoldiersCex)
    
  4. Your scatterplot will be displayed in the graphic window; it will look like the following:
    Time for action - customizing a scatterplot
  5. Use the points(...) function to add new relationships to the scatterplot:
    > #use points(...) to add new relationships to a scatterplot
    > #add points representing the three remaining battle methods
    > #note that after entering each subsequent function into the R console, it will be immediately drawn atop your existing scatterplot
    > #ambush
    > pointsAmbushDataX <- subsetAmbush$WeiSoldiersEngaged
    > pointsAmbushDataY <- subsetAmbush$ShuSoldiersEngaged
    > pointsAmbushType <- "p"
    > pointsAmbushPch <- 1
    > pointsAmbushCex <- 1
    > pointsAmbushCol <- "blue"
    > points(x = pointsAmbushDataX, y = pointsAmbushDataY,
    type = pointsAmbushType, col = pointsAmbushCol,
    pch = pointsAmbushPch, cex = pointsAmbushCex)
    > #head to head
    > pointsHeadToHeadDataX <- subsetHeadToHead$WeiSoldiersEngaged
    > pointsHeadToHeadDataY <- subsetHeadToHead$ShuSoldiersEngaged
    > pointsHeadToHeadType <- "p"
    > pointsHeadToHeadPch <- 3
    > pointsHeadToHeadCex <- 1
    > pointsHeadToHeadCol <- "darkorange2"
    > points(x = pointsHeadToHeadDataX, y = pointsHeadToHeadDataY,
    type = pointsHeadToHeadType, col = pointsHeadToHeadCol,
    pch = pointsHeadToHeadPch, cex = pointsHeadToHeadCex)
    > #surround
    > pointsSurroundDataX <- subsetSurround$WeiSoldiersEngaged
    > pointsSurroundDataY <- subsetSurround$ShuSoldiersEngaged
    > pointsSurroundType <- "p"
    > pointsSurroundPch <- 4
    > pointsSurroundCex <- 1
    > pointsSurroundCol <- "forestgreen"
    > points(x = pointsSurroundDataX, y = pointsSurroundDataY,
    type = pointsSurroundType, col = pointsSurroundCol,
    pch = pointsSurroundPch, cex = pointsSurroundCex)
    
  6. Your points will be added to the existing scatterplot. The scatterplot will look like the following:
    Time for action - customizing a scatterplot
  7. Add a legend to the scatterplot.
    > #add a legend
    > #use the x and y arguments to specify the exact location of the legend
    > #add labels for the battle methods
    > #add fill colors to match the scatterplot's points
    > legend(x = 145000, y = 65000, legend = c("Fire", "Ambush",
    "Head to Head", "Surround"), fill = c("red", "blue",
    "darkorange2", "forestgreen"))
    
  8. Your legend will be added to the existing scatterplot, which should like like the following:
    Time for action - customizing a scatterplot
  9. Use the abline(...) function to add a best fit line to each relationship in the scatterplot.
    > #add a best fit line using abline(...)
    > #the reg argument represents a regression equation
    > #reg is defined using the lm(...) function
    > #the lty argument defines the style of line to be used
    > #as with other graphic functions, the col argument defines a color for the line
    > #note that after entering each subsequent function into the R console, it will be immediately drawn atop your existing scatterplot
    > #fire
    > scatterplotAllMethodsSoldiersFireLineReg <-
    lm(scatterplotFireShuSoldiersData ~
    scatterplotFireWeiSoldiersData)
    > scatterplotAllMethodsSoldiersFireLty <- "solid"
    > #abline(...) will draw a best fit line atop a preexisting plot
    > abline(reg = scatterplotAllMethodsSoldiersFireLineReg,
    lty = scatterplotAllMethodsSoldiersFireLty,
    col = scatterplotAllMethodsSoldiersFireCol)
    > #ambush
    abline(...) functionusing> scatterplotAllMethodsSoldiersAmbushLineReg <-
    lm(pointsAmbushDataY ~ pointsAmbushDataX)
    > scatterplotAllMethodsSoldiersAmbushLty <- "dotted"
    > #abline(...) will draw a best fit line atop a preexisting plot
    > abline(reg = scatterplotAllMethodsSoldiersAmbushLineReg,
    lty = scatterplotAllMethodsSoldiersAmbushLty,
    col = pointsAmbushCol)
    > #head to head
    > scatterplotAllMethodsSoldiersHeadToHeadLineReg <-
    lm(pointsHeadToHeadDataY ~ pointsHeadToHeadDataX)
    > scatterplotAllMethodsSoldiersHeadToHeadLty <- "dotdash"
    > #abline(...) will draw a best fit line atop a preexisting plot
    > abline(reg = scatterplotAllMethodsSoldiersHeadToHeadLineReg,
    lty = scatterplotAllMethodsSoldiersHeadToHeadLty,
    col = pointsHeadToHeadCol)
    > #surround
    > scatterplotAllMethodsSoldiersSurroundLineReg <-
    lm(pointsSurroundDataY ~ pointsSurroundDataX)
    > scatterplotAllMethodsSoldiersSurroundLty <- "dashed"
    > #abline(...) will draw a best fit line atop a preexisting plot
    > abline(reg = scatterplotAllMethodsSoldiersSurroundLineReg,
    lty = scatterplotAllMethodsSoldiersSurroundLty,
    col = pointsSurroundCol)
    
  10. Your best fit lines will be added to the existing scatterplot. The final scatterplot looks like the following:
    Time for action - customizing a scatterplot

What just happened?

We customized our scatterplot's point markers, then expanded it to include additional data, before adding best fit lines to our graphic. Let us examine these items in greater detail.

pch and cex

We customized the data point markers in our fire attack scatterplot using the plot(...) function's pch and cex arguments. These are defined as follows:

  • pch: a whole number between 0 and 25, with each value representing a different style of marker, such as a circle, triangle, or square.
  • cex: a numeric value indicating how much to scale the size of data point markers; 1 by default.

In our case, we used pch with the value 2 to apply triangle markers to our data points and then scaled them by three times with cex equal to 3:

> scatterplotFireSoldiersPch <- 2
> scatterplotFireSoldiersCex <- 3

Thus, we arrived at a plot with large, triangular point markers:

> plot(x = scatterplotFireWeiSoldiersData,
y = scatterplotFireShuSoldiersData,
main = scatterplotFireSoldiersLabelMain,
xlab = scatterplotFireSoldiersLabelX,
ylab = scatterplotFireSoldiersLabelY,
pch = scatterplotFireSoldiersPch,
cex = scatterplotFireSoldiersCex)

The primary purpose of the pch and cex arguments is to improve the visual aspects of scatterplots. In tandem, these arguments can generate a wide array of potential data point markers.

Note

You can see a complete list of the markers available for use in the pch argument by plotting them with plot(0:25, pch = 0:25).

points(...)

To add new relationships to our scatterplot, we executed the points(...) function. This function incorporates additional data points into a plot that is displayed in the graphic window. The primary arguments of the points(...) function are:

  • x: the values to be plotted on the x-axis
  • y: the values to be plotted on the y-axis
  • type: the point type; identical to the type argument in the plot(...) function
  • col: the point color; identical to the col argument in other graphics functions

Thus, the general format for the points(...) function is as follows:

points(x = xPosition, y = yPosition, type = "type",
col = "colorName")

In tandem with these, we also used the pch and cex arguments in our points(...) functions to customize the style and size of our data markers. The x and y arguments featured the Wei and Shu soldier data for each method:

> #ambush
> pointsAmbushDataX <- subsetAmbush$WeiSoldiersEngaged
> pointsAmbushDataY <- subsetAmbush$ShuSoldiersEngaged
> pointsAmbushType <- "p"
> pointsAmbushPch <- 1
> pointsAmbushCex <- 1
> pointsAmbushCol <- "blue"
> #head to head
> pointsHeadToHeadDataX <- subsetHeadToHead$WeiSoldiersEngaged
> pointsHeadToHeadDataY <- subsetHeadToHead$ShuSoldiersEngaged
> pointsHeadToHeadType <- "p"
> pointsHeadToHeadPch <- 3
> pointsHeadToHeadCex <- 1
> pointsHeadToHeadCol <- "darkorange2"
> #surround
> pointsSurroundDataX <- subsetSurround$WeiSoldiersEngaged
> pointsSurroundDataY <- subsetSurround$ShuSoldiersEngaged
> pointsSurroundType <- "p"
> pointsSurroundPch <- 4
> pointsSurroundCex <- 1
> pointsSurroundCol <- "forestgreen"

After beginning our scatterplot with fire attack data, we used points(...) to plot the soldier data for our ambush, head to head, and surround methods:

> #ambush
> points(x = pointsAmbushDataX, y = pointsAmbushDataY,
type = pointsAmbushType, col = pointsAmbushCol,
pch = pointsAmbushPch, cex = pointsAmbushCex)
> #head to head
> points(x = pointsHeadToHeadDataX, y = pointsHeadToHeadDataY,
type = pointsHeadToHeadType, col = pointsHeadToHeadCol,
pch = pointsHeadToHeadPch, cex = pointsHeadToHeadCex)
> #surround
> points(x = pointsSurroundDataX, y = pointsSurroundDataY,
type = pointsSurroundType, col = pointsSurroundCol,
pch = pointsSurroundPch, cex = pointsSurroundCex)

Note

Note that we also redefined the x-axis and y-axis scales with xlim and ylim prior to adding our new points. This allowed all of our values to display within the bounds of our chart. If we did not rescale the axes, most of our points would fall outside the upper limit of our graph, because the fire attack soldier values are much smaller than in our other battle methods.

legend(...)

We used our familiar legend(...) function to add a key that identified the points from each of our battle method datasets. Its title and colors were matched to those of the points in our scatterplot:

> legend(x = 145000, y = 65000, legend = c("Fire", "Ambush",
"Head to Head", "Surround"), fill = c("red", "blue", "darkorange2",
"forestgreen"))

abline(...)

After completing our scatterplot setup, we added best fit lines. Also known as a regression line, a best fit line expresses the relationship in a scatterplot as a single, straight line. To accomplish this, the line attempts to orient itself as close as possible to all of the data points. The result is a line that approximates a linear relationship between the variables. In R, we can use the abline(...) function to add a best fit line to an existing graphic. In addition to the col argument, which we already know about, the primary arguments for abline(...) are:

  • reg: a linear model formula generated by the lm(...) function
  • lty: a text value representing the line style; one of blank, solid, dashed, dotted, dotdash, longdash, or twodash

The basic structure of the abline(...) function is as follows:

abline(reg = lm(y ~ x), lty = "lineType")

In our abline(...) functions, we used lty to define unique line types for each of our battle methods. We also matched our lines' colors to those of our scatterplot's points. Our reg arguments used the lm(...) function to specify the number of Shu soldiers as our y variable and the number of Wei soldiers as our x variable:

> #fire
abline(...) functionsyntax> scatterplotAllMethodsSoldiersFireLineReg <-
lm(scatterplotFireShuSoldiersData ~
scatterplotFireWeiSoldiersData)
> scatterplotAllMethodsSoldiersFireLty <- "solid"
> #ambush
> scatterplotAllMethodsSoldiersAmbushLineReg <-
lm(pointsAmbushDataY ~ pointsAmbushDataX)
> scatterplotAllMethodsSoldiersAmbushLty <- "dotted"
> #head to head
> scatterplotAllMethodsSoldiersHeadToHeadLineReg <-
lm(pointsHeadToHeadDataY ~ pointsHeadToHeadDataX)
> scatterplotAllMethodsSoldiersHeadToHeadLty <- "dotdash"
> #surround
> scatterplotAllMethodsSoldiersSurroundLineReg <-
lm(pointsSurroundDataY ~ pointsSurroundDataX)
> scatterplotAllMethodsSoldiersSurroundLty <- "dashed"

The complete abline(...) functions incorporated our reg, lty, and col arguments to draw best fit lines for our battle method data:

> #fire
> abline(reg = scatterplotAllMethodsSoldiersFireLineReg,
lty = scatterplotAllMethodsSoldiersFireLty,
col = scatterplotAllMethodsSoldiersFireCol)
> #ambush
> abline(reg = scatterplotAllMethodsSoldiersAmbushLineReg,
lty = scatterplotAllMethodsSoldiersAmbushLty,
col = pointsAmbushCol)
> #head to head
> abline(reg = scatterplotAllMethodsSoldiersHeadToHeadLineReg,
lty = scatterplotAllMethodsSoldiersHeadToHeadLty,
col = pointsHeadToHeadCol)
> #surround
> abline(reg = scatterplotAllMethodsSoldiersSurroundLineReg,
lty = scatterplotAllMethodsSoldiersSurroundLty,
col = pointsSurroundCol)

A best fit line is useful in gauging whether or not the relationship between two variables is indeed linear. Therefore, it is beneficial to apply when exploring a new dataset. We can also use best fit lines to compare the relationships between related datasets.

In our plot, it is quite clear that the relationship between the numbers of Shu and Wei soldiers engaged is different for different battle methods. For instance, the best fit lines help us to see that in the surround method, the number of Shu soldiers tends to be relatively high compared to the number of Wei soldiers. In contrast, with the fire attack method, the number of Wei soldiers tends to be relatively high compared to the number of Shu soldiers. Using a scatterplot such as this one, along with one or more best fit lines, is still another way to inform our interpretations and understanding of the relationships between our variables. Moreover, using a graphic often helps us to discover things that we cannot see in the raw data alone.

Pop quiz

  1. In the plot(...) function, what is the relationship between the pch and cex arguments?

    a. pch sets the type of data point marker, while cex sets the size of the marker.

    b. cex sets the type of data point marker, while pch sets the size of the marker.

    c. pch sets the number of data point markers, while cex sets the style of the markers.

    d. cex sets the number of data point markers, while pch sets the style of the markers.

  2. Which of the following is not a benefit of using a scatterplot and best fit line useful to explore the relationship between two variables?

    a. They help us to understand the relationship between the variables.

    b. They inform our interpretation of the relationship between the variables.

    c. They tell us whether the variables will have an interaction effect.

    d. They indicate the linearity of the relationship between the variables.

Have a go hero

Create a scatterplot that depicts the relationship between the execution and rating of past fire attacks. Be sure to use the numeric version of the successful execution variable. Note that since execution is dichotomous (containing only two possible values), the resulting plot will look different from the ones we created with our soldier data. Try to interpret the meaning of this graphic. Does it make sense to add a best fit line in this situation?

Now use the sunflowerplot(...) function with the same arguments that you just used in the plot(...) argument. Try to interpret the meaning of this graphic. Refer back to the raw fire data for help recalling the data contained in the Rating and SuccessfullyExecuted variables.

Consider the graphics generated by your plot(...) and sunflowerplot(...) functions. How do these functions differ in the way they portray data?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset