Appendix F. Solutions to Chapter Exercises

A solution is provided for each exercise in the book. Do not look at the solution until you have made a serious effort to solve the exercise! For many problems, there will be several possible solutions in R. If you come up with a solution different from the one provided, try to see if the two solutions are equivalent—do you get the same answer? Why or why not?

Exercises 1-1 Through 1-4

Solutions provided in the chapter.

Exercise 3-1

attach(mtcars)
stripchart(mpg ~ cyl, method = "jitter")

This helps to separate the cars a bit. Now we can see how many cars are in each group.

Not surprisingly, cars with fewer cylinders get better gas mileage.

Exercise 3-2

install.packages("plotrix", dependencies=TRUE)
library(plotrix)
attach(trees)
dotplot.mtb(Volume)

A type of jittering is automatic. Even so, some values that are very close still run together. One way to deal with this is to make the plot character smaller:

dotplot.mtb(Volume, pch = 20) # or
dotplot.mtb(Volume, pch = ".") # too small!
dotplot.mtb(Volume, pch = "/")  # Hmm...
detach(trees)

Exercise 4-1

dotchart(USArrests$Murder, labels = row.names(USArrests))

The state names are so big, they overwrite and become illegible!

Exercise 4-2

load("Nimrod.rda") # .rda shows it was saved as an R data frame
dotchart(Nimrod$time)

Good!

dotchart(Nimrod$time, labels = Nimrod$performer, cex = .5)

Better!

Nimrod2 = Nimrod[order(Nimrod$time),]
dotchart(Nimrod2$time, labels = Nimrod2$performer, cex = .5)

Yeah!

Exercise 5-1

# print results to screen
library(nlme)
attach(MathAchieve)
boxplot(SES ~ Minority * Sex)

# graph to file
pdf("SES.pdf") # opens a device
library(nlme)
attach(MathAchieve)
boxplot(SES ~ Minority * Sex)
dev.off() # closes and saves file

Insert the file SES.pdf into a word processor document.

It looks like SES has the same relationship to Minority and Sex that MathAch has.

Exercise 5-2

par(mfrow = c(1,2)) # show 2 graphs side-by-side
attach(mtcars)
boxplot(mpg ~ cyl)
library(plotrix)
ehplot(mpg, cyl)
detach(mtcars)
par(mfrow=c(1,1)) # reset to one graph/page

The box plot shows reference points (i.e., quartiles); the EH-Plot shows actual points, including jittering.

You can add a box and whiskers to the EH-Plot: ehplot(mpg, cyl, box = T).

Exercise 6-1

library(multcomp)
stem(sbp$sbp, scale = .5)
stem(sbp$sbp, scale = 3)
stem(sbp$sbp, scale = 4)

A scale choice larger than 3 seems to give us worthless charts! This will not necessarily be true for all datasets.

Exercise 6-2

install.packages("aplpack", dependencies = T)
library(aplpack)
attach(trees)
stem.leaf.backback(Height, Volume)
detach(trees)

The units of measurement are different. What happens if you standardize the units?

library(car)
attach(Baumann)
stem.leaf.backback(pretest.1, post.test.1)

It appears that the posttest scores are higher.

Exercise 7-1

library(Sleuth2)
attach(case0302)
par(mfrow = c(2,2)) # show 4 graphs on one page
hist(Dioxin)
hist(Dioxin, breaks = 20)
hist(Dioxin, breaks = 30)
hist(Dioxin, breaks = 40)
par(mfrow = c(1,1)) # reset to one graph/page

The distributional shape changes a great deal. With no breaks specified, it looks like most points are clustered around zero. With many breaks, it looks like a nearly symmetrical distribution with a few extreme values added to it. The strip chart emphasizes this idea. It appears that just two points are atypical. Why are they so different? Are they mistakes? Should they be included?

 

Exercise 7-2

library(Hmisc) library(car) attach(Burt) histbackback(IQbio, IQfoster) detach(Burt)

The two histograms look fairly similar, except that the IQbio group has a few more higher-end IQs.

Studying the salaries of males and females in the Salary dataset is a little trickier than the previous problem, because there are not separate vectors for “male salary” and “female salary.” So, to use histbackback(), we must create such vectors. There are several ways to do this. Here is one of them:

attach(Salaries)
m = subset(Salaries, sex == "Male") # contains only male data
f = subset(Salaries, sex == "Female") # contains female data
histbackback(m$salary, f$salary)
detach(Salaries)

Exercise 8-1

library(multcomp)
eq2 = density(sbp$sbp, bw = 4) # Figure 8-1c
hist (sbp$sbp,
  main="c. Histogram + Kernel Density, bw = 4",
  col = "maroon", las = 1, cex.axis = .8,
  freq = F) #freq=F: prob. densities
lines(eq2,lwd = 2) # Plot density curve on existing histogram

This gives us sharper bends than when bw = 5.

eq2 = density(sbp$sbp, bw = 2) # Figure 8-1-c
hist (sbp$sbp,
  main="c. Histogram + Kernel Density, bw = 2",
  col = "maroon", las = 1, cex.axis = .8,
  freq = F) #freq=F: prob. densities
lines(eq2,lwd = 2) # Plot density curve on existing histogram

This produces about a dozen bends; further detail does not really fit the histogram better, but then, the histogram is a very rough approximation.

eq2 = density(sbp$sbp, bw = 20) # Figure 8-1c
hist (sbp$sbp,
  main="c. Histogram + Kernel Density, bw = 20",
  col = "maroon", las = 1, cex.axis = .8,
  freq = F) #freq=F: prob. densities
lines(eq2,lwd = 2) # Plot density curve on existing histogram

This line is flatter than bw = 10, but not a radical change.

Exercise 8-2

The sbp of 125 has a y-coordinate about halfway between 0 and 0.25, or about 0.125. That is to say, the probability of selecting a patient with systolic blood pressure of 125 or less is about 0.125, or 12.5 percent.

By a similar process, we can determine that the probability of a systolic blood pressure less than or equal to 175 is about 90 percent. The probability of selecting a person with systolic blood pressure of 175 or greater is 1 – prob[175 or less] = 1 – 0.9 = 0.1, or 10 percent.

The probability of falling between 125 and 175 is prob[175 or less] – prob[125 or less] = 90% – 12.5% = 77.5%.

Exercise 9-1

The Male bars are on top in each group, but the legend is reversed in order, making the chart confusing to read. Change sexlab to c("Male", "Female") in legend().

Exercise 9-2

library(car)
attach(Salaries)
library(epicalc)
salK=salary/10000
pyramid(salK,sex, binwidth = 1,
  col = "seagreen",
  main = "Salaries in 10,000s of Dollars",
  cex.bar.value =.4, cex.axis = .8)

Compare this to the result from “Exercise 7-2”.

Exercise 10-1

install.packages("HistData")
library(HistData)
attach(Nightingale)
deaths = c(sum(Disease), sum(Wounds), sum(Other))
pie(deaths, labels=c("Disease","Wounds","Other"))

The deaths command found the totals in each of the columns Disease, Wounds, and Other. Those totals were used to make the pie chart. The difference between Disease and the other causes was very obvious, but the difference between the other variables is not easy to see.

Exercise 10-2

load("Nimrod.rda")
x = table(Nimrod$medium)
x
pie(x, labels = c("Brass band","Concert band","Organ",
                  "Orchestra"),
  col = c("gold3","deepskyblue","peachpuff3","magenta"))

Exercise 11-1

load("Nimrod.rda")
den = density(Nimrod$time)
plot(den)
rug(Nimrod$time)

library(nlme)
boxplot(MathAchieve$SES,main="Socioeconomic Status", ylab="SES 
  score")
rug(MathAchieve$SES)

The first of these is probably more helpful. The second one is so dense that we cannot see a separation of points over most of the range of the data.

Exercise 12-1

Year  =  c(2004:2010)
Europe = c(7.9, 7.9, 7.9, 7.8, 7.7, 7.1, 7.2)
Eurasia = c(8.5, 8.5, 8.7, 8.6, 8.9, 8, 8.4)

plot(Year, Europe, type = "l",
  col = "maroon", ylim = c(7,9),
  ylab = "Emissions", lwd = 2)
# ylim makes graph big enough for Eurasia with values > 7.9

lines(Year, Eurasia,
  lty = "dashed",
  col = "steelblue", lwd = 2)
legend("topleft", c("Eurasia", "Europe"),
  text.col = c("steelblue","maroon"),
  lty = c("dashed","solid"))

Exercise 12-2

library(Sleuth2)
library(epade)
attach(case0701)
plot(Velocity, Distance)

As expected, distance is greater with greater velocity.

scatter.ade(Velocity, Distance, wall = 3,
  main = "The Big Bang", col="red")

Very cool!

Exercise 13-1

library(nlme)
attach(MathAchieve)
plot(SES, MathAch)

There may be a near-linear relationship, but it is hard to be sure.

library(nlme)
attach(MathAchieve)
sunflowerplot(SES, MathAch)

The sunflower plot does not help very much.

library(nlme)
attach(MathAchieve)
library(hexbin)
plot(hexbin(SES, MathAch), colramp = heat.ob)

The relationship is much clearer, nearly vertical; that is to say, there is a similar wide range of math scores for all SES scores from –1 to about 1.5.

detach(MathAchieve)

Exercise 14-1

library(ResearchMethods)
load("baplot") # if you saved baplot!
data(MFSV)
attach(MFSV)
baplot(MF,SV)

There is no obvious pattern, but we don’t know, based on the information given in the help file, what the clinically acceptable difference is.

Exercise 15-1

library(reshape2)
library(car)
attach(tips)
tips$ratio = 100*(tip/total_bill)
attach(tips)
qqnorm(ratio)
qqline(ratio)

ratio is not normally distributed; it is skewed at the upper end.

qq(time ~ ratio)

lunch and dinner are very different distributions.

qq(smoker ~ ratio)

smoker and nonsmoker are also very different.

Exercise 15-2

library(reshape2)
library(car)
attach(Vocab)
qqnorm(vocabulary)
qqline(vocabulary)

vocabulary does not seem to be normally distributed.

qqnorm(education)
qqline(education)

education is not, either.

qq(sex ~ vocabulary)

It is predominantly female through most of the range, and male at the upper extreme.

Exercise 16-1

library(car)
attach(Ginzberg)
head(Ginzberg)
pairs(~ simplicity + fatalism + depression)

scatterplotMatrix(~ simplicity + fatalism + depression)

library(corrplot)
corrplot(y)

library(GGally)
ggscatmat(Ginzberg, columns = 1:3)

Exercise 17-1

library(scatterplot3d)
library(epicalc)
data(SO2)
scatterplot3d(SO2$smoke, SO2$SO2, SO2$deaths,
  highlight.3d = T,
  type = "h")

This is a tricky dataset to work with, because the dataset and a variable have the same name. If you try attach(SO2) and scatterplot3d(smoke, SO2, deaths), it won’t work! Therefore, use the SO2$smoke name instead. Deaths seem to be positively related to both of the other variables.

Exercise 17-2

library(lattice)
library(epicalc)
data(SO2)
levelplot(SO2$deaths ~ SO2$smoke + SO2$SO2)

Exercise 18-1

library(Sleuth2)
attach(ex1123)
plot(SO2, Mort)

There appears to be a strong relationship between SO2 and mortality, with the exception of a single, high-mortality point.

coplot(Mort ~ SO2 | Educ)

The coplot shows that there is little SO2, and lower mortality, in cities where education levels are highest! Perhaps there are fewer smokestacks in cities with major financial businesses or research centers.

Exercise 19-1

attach(mtcars)
cars = as.matrix(mtcars)
h = dist(scale(cars))
h2 = hclust(h, method = "single")
plot(h2)

h = dist(scale(cars), method = "manhattan")
h2 = hclust(h, method = "single")
plot(h2)

Exercise 19-2

cars = as.matrix(mtcars)
image(scale(cars), col = cm.colors(256))
image(scale(cars), col = rainbow(100))
image(scale(cars), col = terrain.colors(16))

And so on. The number after colors tells how many colors are in the range.

Exercise 19-3

install.packages("Amelia", dependencies = T)
library(Amelia)
missmap(airquality)

library(epade)
missiogram.ade(airquality)

Exercise 20-1

The Wikipedia article on mosaic plots includes a demonstration of the Titanic dataset. You can also find a mosaic plot of this data at http://bit.ly/1VDPbmp in a different orientation. You can find another one at http://bit.ly/1jdkQKr.

Exercise 21-1

attach(Nimrod)
par(bg = "white", mfrow = c(2,2))

# graph 1
x = table(medium)
barplot(x, horiz = T,
  main = "Number of ensembles of each medium", 
  names = c("Brass band", "Concert band", "Organ","Orchestra"),
  las = 1, cex.names = .8, col = "turquoise", space = 1.5)

# graph 2
cols = "cadetblue4"
boxplot(time ~ medium,
  col = c("goldenrod","firebrick"),
  ylab = "Time",  xlab = "Medium",
  varwidth = T,
  main = "Performance times by medium",
  col.main = cols,
  col.axis = cols,
  las = 1, col.lab = cols,
  names = c("","","",""))
mtext(text = c("Brass band", Concert band", 
  "Organ", "Orchestra"),
  side = 1, cex = .6,
  at = c(1,2,3,4), line = 1)

# graph 3
pro = subset(Nimrod, level == "p")
am = subset(Nimrod, level == "a")
plot(density(pro$time), ylim = c(0,.028),
  main = "Professional vs. amateur groups",
  xlab = "Time in seconds",
  col = "navy", lwd = 2,
  bty = "n", xlim = c(100,350),
  family = "HersheyScript",
  cex.main = 1.4, cex.lab = 1.3)
lines(density(am$time), lty = "dotted",
  col = "lightblue4", lwd = 2)
legend("topright", c("Amateur","Professional"),
  cex = .8, text.col = c("lightblue4","navy"),
  bty = "n")

# graph 4
data2 = Nimrod[order(Nimrod$time),]
dotchart(data2$time,
  labels = data2$performer, cex = .34,
  main = "Performance Time by Performer",
  xlab = "Performance time", pch = 19,
  col = c("violetred1", "violetred4"),
  lcolor = "gray90",
  cex.main = 1.9, col.main = "violetred4",
  cex.lab = 1.4,
  family = "HersheySerif")
detach(Nimrod)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset