List of Figures

Chapter 1. Introduction to R

Figure 1.1. Steps in a typical data analysis

Figure 1.2. Relationships between income, education, and prestige for blue-collar (bc), white-collar (wc), and professional jobs (prof). Source: car package (scatterplotMatrix function) written by John Fox. Graphs like this are difficult to create in other statistical programming languages but can be created with a line or two of code in R.

Figure 1.3. Example of the R interface on Windows

Figure 1.4. Scatter plot of infant weight (kg) by age (mo)

Figure 1.5. A sample of the graphs created with the demo() function

Figure 1.6. Input with the source() function and output with the sink() function

Figure 1.7. Output from listing 1.3 including (left to right) output from the arthritis example, general help, information on the vcd package, information on the Arthritis dataset, and a graph displaying the relationship between arthritis treatment and outcome

Chapter 2. Creating a dataset

Figure 2.1. R data structures

Figure 2.2. Sources of data that can be imported into R

Figure 2.3. Entering data via the built-in editor on a Windows platform

Figure 2.4. Stat/Transfer’s main dialog on Windows

Chapter 3. Getting started with graphs

Figure 3.1. Creating a graph

Figure 3.2. Line plot of dose vs. response for drug A

Figure 3.3. Line plot of dose vs. response for drug A with modified line type and symbol

Figure 3.4. Plotting symbols specified with the pch parameter

Figure 3.5. Line types specified with the lty parameter

Figure 3.6. Line plot of dose vs. response for drug A with modified line type, line width, symbol, and symbol width

Figure 3.7. Line plot of dose vs. response for both drug A and drug B

Figure 3.8. Line plot of dose versus response for drug A with title, subtitle, and modified axes

Figure 3.9. A demonstration of axis options

Figure 3.10. An annotated comparison of Drug A and Drug B

Figure 3.11. Example of a scatter plot (car weight vs. mileage) with labeled points (car make)

Figure 3.12. Examples of font families on a Windows platform

Figure 3.13. Partial results from demo(plotmath)

Figure 3.14. Graph combining four figures through par(mfrow=c(2,2))

Figure 3.15. Graph combining with three figures through par(mfrow=c(3,1))

Figure 3.16. Graph combining three figures using the layout() function with default widths

Figure 3.17. Graph combining three figures using the layout() function with specified widths

Figure 3.18. A scatter plot with two box plots added to the margins

Figure 3.19. Specifying locations using the fig= graphical parameter

Chapter 4. Basic data management

Figure 4.1. Renaming variables interactively using the fix() function

Chapter 5. Advanced data management

Figure 5.1. Reshaping data with the melt() and cast() functions

Chapter 6. Basic graphs

Figure 6.1. Simple vertical and horizontal bar charts

Figure 6.2. Stacked and grouped bar plots

Figure 6.3. Bar plot of mean illiteracy rates for US regions sorted by rate

Figure 6.4. Horizontal bar plot with tweaked labels

Figure 6.5. Spinogram of arthritis treatment outcome

Figure 6.6. Pie chart examples

Figure 6.7. A fan plot of the country data

Figure 6.8. Histograms examples

Figure 6.9. Kernel density plots

Figure 6.10. Kernel density plots of mpg by number of cylinders

Figure 6.11. Box plot with annotations added by hand

Figure 6.12. Box plots of car mileage versus number of cylinders

Figure 6.13. Notched box plots for car mileage versus number of cylinders

Figure 6.14. Box plots for car mileage versus transmission type and number of cylinders

Figure 6.15. Violin plots of mpg versus number of cylinders

Figure 6.16. Dot plot of mpg for each car model

Figure 6.17. Dot plot of mpg for car models grouped by number of cylinders

Chapter 8. Regression

Figure 8.1. Scatter plot with regression line for weight predicted from height

Figure 8.2. Quadratic regression for weight predicted by height

Figure 8.3. Scatter plot of height by weight, with linear and smoothed fits, and marginal box plots

Figure 8.4. Scatter plot matrix of dependent and independent variables for the states data, including linear and smoothed fits, and marginal distributions (kernel density plots and rug plots)

Figure 8.5. Interaction plot for hp*wt. This plot displays the relationship between mpg and hp at 3 values of wt.

Figure 8.6. Diagnostic plots for the regression of weight on height

Figure 8.7. Diagnostic plots for the regression of weight on height and height-squared

Figure 8.8. Diagnostic plots for the regression of murder rate on state characteristics

Figure 8.9. Q-Q plot for studentized residuals

Figure 8.10. Distribution of studentized residuals using the residplot() function

Figure 8.11. Component plus residual plots for the regression of murder rate on state characteristics

Figure 8.12. Spread-level plot for assessing constant error variance

Figure 8.13. Index plot of hat values for assessing observations with high leverage

Figure 8.14. Cook’s D plot for identifying influential observations

Figure 8.15. Added-variable plots for assessing the impact of influential observations

Figure 8.16. Influence plot. States above +2 or below –2 on the vertical axis are considered outliers. States above 0.2 or 0.3 on the horizontal axis have high leverage (unusual combinations of predictor values). Circle size is proportional to influence. Observations depicted by large circles may have disproportionate influence on the parameters estimates of the model.

Figure 8.17. Best four models for each subset size based on Adjusted R-square

Figure 8.18. Best four models for each subset size based on the Mallows Cp statistic

Figure 8.19. Bar plot of relative weights for the states multiple regression problem

Chapter 9. Analysis of variance

Figure 9.1. Treatment group means with 95 percent confidence intervals for five cholesterol-reducing drug regiments

Figure 9.2. Plot of Tukey HSD pairwise mean comparisons

Figure 9.3. Tukey HSD tests provided by the multcomp package

Figure 9.4. Test of normality

Figure 9.5. Plot of the relationship between gestation time and birth weight for each of four drug treatment groups

Figure 9.6. Interaction between dose and delivery mechanism on tooth growth. The plot of means was created using the interaction.plot() function.

Figure 9.7. Interaction between dose and delivery mechanism on tooth growth. The mean plot with 95 percent confidence intervals was created by the plotmeans() function.

Figure 9.8. Main effects and two-way interaction for the ToothGrowth dataset. This plot was created by the interaction2way() function.

Figure 9.9. Interaction of ambient CO2 concentration and plant type on CO2 uptake. Graph produced by the interaction.plot() function.

Figure 9.10. Interaction of ambient CO2 concentration and plant type on CO2 uptake. Graph produced by the boxplot() function.

Figure 9.11. A Q-Q plot for assessing multivariate normality

Chapter 10. Power analysis

Figure 10.1. Four primary quantities considered in a study design power analysis. Given any three, you can calculate the fourth.

Figure 10.2. Sample size needed to detect various effect sizes in a one-way ANOVA with five groups (assuming a power of 0.90 and significance level of 0.05)

Figure 10.3. Sample size curves for detecting a significant correlation at various power levels

Figure 10.4. Sample dialog boxes from the piface program

Chapter 11. Intermediate graphs

Figure 11.1. Scatter plot of car mileage versus weight, with superimposed linear and lowess fit lines.

Figure 11.2. Scatter plot with subgroups and separately estimated fit lines

Figure 11.3. Scatter plot matrix created by the pairs() function

Figure 11.4. Scatter plot matrix created with the scatterplotMatrix() function. The graph includes kernel density and rug plots in the principal diagonal and linear and loess fit lines.

Figure 11.5. Scatter plot matrix produced by the scatterplot.Matrix() function. The graph includes histograms in the principal diagonal and linear and loess fit lines. Additionally, subgroups (defined by number of cylinders) are indicated by symbol type and color.

Figure 11.6. Scatter plot matrix produced with the cpairs() function in the gclus package. Variables closer to the principal diagonal are more highly correlated.

Figure 11.7. Scatter plot with 10,000 observations and significant overlap of data points. Note that the overlap of data points makes it difficult to discern where the concentration of data is greatest.

Figure 11.8. Scatterplot using smoothScatter() to plot smoothed density estimates. Densities are easy to read from the graph.

Figure 11.9. Scatter plot using hexagonal binning to display the number of observations at each point. Data concentrations are easy to see and counts can be read from the legend.

Figure 11.10. Scatter plot of 10,000 observations, where density is indicated by color. The data concentrations are easily discernable.

Figure 11.11. 3D scatter plot of miles per gallon, auto weight, and displacement

Figure 11.12. 3D scatter plot with vertical lines and shading

Figure 11.13. 3D scatter plot with vertical lines, shading, and overlaid regression plane

Figure 11.14. Rotating 3D scatter plot produced by the plot3d() function in the rgl package

Figure 11.15. Spinning 3D scatter plot produced by the scatter3d() function in the Rcmdr package

Figure 11.16. Bubble plot of car weight versus mpg where point size is proportional to engine displacement

Figure 11.17. Comparison of a scatter plot and a line plot

Figure 11.18. type=options in the plot() and lines() functions

Figure 11.19. Line chart displaying the growth of five orange trees

Figure 11.20. Correlogram of the correlations among the variables in the mtcars data frame. Rows and columns have been reordered using principal components analysis.

Figure 11.21. Correlogram of the correlations among the variables in the mtcars data frame. The lower triangle contains smoothed best fit lines and confidence ellipses, and the upper triangle contains scatter plots. The diagonal panel contains minimum and maximum values. Rows and columns have been reordered using principal components analysis.

Figure 11.22. Correlogram of the correlations among the variables in the mtcars data frame. The lower triangle is shaded to represent the magnitude and direction of the correlations. The variables are plotted in their original order.

Figure 11.23. Mosaic plot describing Titanic survivors by class, sex, and age

Chapter 12. Resampling statistics and bootstrapping

Figure 12.1. Strip chart of the hypothetical treatment data in table 12.1

Figure 12.2. Distribution of bootstrapped R-squared values

Figure 12.3. Distribution of bootstrapping regression coefficients for car weight

Chapter 13. Generalized linear models

Figure 13.1. Distribution of post-treatment seizure counts (Source: Breslow seizure data)

Chapter 14. Principal components and factor analysis

Figure 14.1. Comparing principal components and factor analysis models. The diagrams show the observed variables (X1 to X5), the principal components (PC1, PC2), factors (F1, F2), and errors (e1 to e5).

Figure 14.2. Assessing the number of principal components to retain for the US Judge Rating example. A scree plot (the line with x’s), eigenvalues greater than 1 criteria (horizontal line), and parallel analysis with 100 simulations (dashed line) suggest retaining a single component.

Figure 14.3. Assessing the number of principal components to retain for the Body Measurements example. The scree plot (line with x’s), eigenvalues greater than 1 criteria (horizontal line), and parallel analysis with 100 simulations (dashed line) suggest retaining two components.

Figure 14.4. Assessing the number of factors to retain for the psychological tests example. Results for both PCA and EFA are present. The PCA results suggest one or two components. The EFA results suggest two factors.

Figure 14.5. Two factor plot for the psychological tests in ability.cov. Vocab and reading load on the first factor (PA1). while blocks, picture, and maze load on the second factor (PA2). The general intelligence test loads on both.

Figure 14.6. Diagram of the oblique two factor solution for the psychological test data in ability.cov

Figure 14.7. A principal components/exploratory factor analysis decision chart

Chapter 15. Advanced methods for missing data

Figure 15.1. Methods for handling incomplete data, along with the R packages that support them

Figure 15.2. aggr() produced plot of missing values patterns for the sleep dataset.

Figure 15.3. Matrix plot of actual and missing values by case (row) for the sleep dataset. The matrix is sorted by BodyWgt.

Figure 15.4. Scatter plot between amount of dream sleep and length of gestation, with information about missing data in the margins

Figure 15.5. Steps in applying multiple imputation to missing data via the mice approach.

Chapter 16. Advanced graphics

Figure 16.1. Trellis graph of singer heights by voice pitch

Figure 16.2. Trellis plot of mpg versus car weight conditioned on engine displacement. Because engine displacement is a continuous variable, it has been converted to three nonoverlapping shingles with equal numbers of observations.

Figure 16.3. Trellis plot of mpg versus car weight conditioned on engine displacement. A custom panel function has been used to add regression lines, rug plots, and grid lines.

Figure 16.4. Trellis graph of mpg versus engine displacement conditioned on transmission type. Smoothed lines (loess), grids, and group mean levels have been added.

Figure 16.5. Kernel density plots for miles per gallon grouped by transmission type. Jittered points are provided on the horizontal axis.

Figure 16.6. Kernel density plots for miles per gallon grouped by transmission type. Graphical parameters have been modified and a customized legend has been added. The custom legend specifies color, shape, line type, character size, and title.

Figure 16.7. xyplot showing the impact of ambient carbon dioxide concentrations on carbon dioxide uptake for 12 plants in two treatment conditions and two types. Plant is the group variable and Treatment and Type are the conditioning variables.

Figure 16.8. Box plots of auto mileage by number of cylinders. Data points are superimposed and jittered.

Figure 16.9. Scatter plot between auto mileage and car weight, with separate regression lines and confidence bands by engine transmission type (manual, automatic)

Figure 16.10. Scatter plot between auto mileage and car weight, faceted by transmission type (manual, automatic) and number of cylinders (4, 6, or 8). Symbol size represents horsepower.

Figure 16.11. Faceted density plots for singer heights by voice part

Figure 16.12. The playwith window. The user can edit the graph using the mouse with this GTK+ GUI.

Figure 16.13. playwith window with latticist functionality. The user can create lattice and vcd graphs interactively.

Figure 16.14. An iplots demonstration created by listing 16.6. Only four of the six windows are displayed to save room. In these graphs, the user has clicked on the three-gear bar in the bar chart window.

Appendix A. Graphic user interfaces

Figure A.1. RStudio IDE

Figure A.2. R Commander GUI

Appendix D. Creating publication-quality output

Figure D.1. Process for generating a publication-quality report using Sweave

Figure D.2. Page 1 of the report created from the sample noweb file in listing D.1. The noweb file was processed through the Sweave() function in R and the resulting TeX file was processed through a LaTeX compiler to produce a PDF document.

Figure D.3. Page 2 of the report created from the sample noweb file in listing D.1.

Figure D.4. Initial noweb file (example.odt) to be processed through odfWeave

Figure D.5. Final report in ODF format (example-out.odt). Page 2 is similar to the second page of the Sweave output in figure D.2 and is omitted to save space

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset