Index
A
alpha levels
changing 369–370
comparing independent groups 242–243
comparing multiple groups 293–295
estimating the means 184
ALPHA= option
MEANS procedure 183–184
MEANS statement (ANOVA) 293–294, 298
REG procedure 369–371
TABLES statement (FREQ) 490, 492
TTEST procedure 242–243
alternative hypothesis 156
analysis of variance
See ANOVA
Analysis of Variance table
checking model fit 379, 435–439
fitting curves 365, 367
reviewing 348–349
Anderson-Darling test 141
ANOVA (analysis of variance)
deciding whether to use 268
ODS table names 348–349
one-way 267, 268, 270–275
Welch 277–278
with unequal variances 275–278
ANOVA procedure
See also MEANS statement, ANOVA procedure
BY statement 279
Class Level table 273, 280
CLASS statement 271–272, 279
comparing multiple groups 271–279
Fit Statistics table 273–274, 280
general form 279, 298
identifying ODS table names 299
interactive usage 299
Model ANOVA table 273, 275, 280
MODEL statement 271, 279, 287
Number of Observations table 273, 280
Overall ANOVA table 273–275, 280
summary example 305
average
See sample average
B
balanced data
ANOVA procedure for 271
defined 268
bar charts
checking for errors in variables 113
creating for variables 101–111
creating ordered 110–111
for contingency tables 499
high-resolution 499
horizontal 107–109
line printer 499
side-by-side 227–229, 261
vertical 102–107
Basic Statistical Measures table
checking data for errors 113
comparing independent groups 222
comparing paired groups 196
reviewing 80–81, 124
summarizing continuous variables 76, 82
BETWEEN-AND operator 155
blank lines, adding between observations 38
BON option, MEANS statement (ANOVA) 289, 298
Bonferroni approach 288–291
box plots
checking normality of errors 441–443
overview 89–90
side-by-side 229–231, 261, 264–266
testing for normality 143
BOXPLOT procedure
general form 231, 266
PLOT statement 230–231, 264–266
summarizing data 229–231, 261
BOXWIDTHSCALE= option, PLOT
statement (BOXPLOT) 264–266
BWSLEGEND option, PLOT statement (BOXPLOT) 264–266
BY statement
ANOVA procedure 279
NPAR1WAY procedure 249, 283
SORT procedure 39–41
TTEST procedure 241
C
CARDS statement 33
cell frequency
defined 467
expected 476, 480–482
Fisher’s exact test 481
cells, defined 467
Central Limit Theorem
confidence intervals and 180–181
Empirical Rule and 179–180
normal distribution and 175–180
CHART procedure
checking for errors in variables 113
creating line printer bar charts 499
functionality 101
general form 103–104, 229
HBAR statement 109–111, 124
identifying ODS tables 124
summarizing data 227–229, 261
VBAR statement 104, 107, 124, 227–229
vertical bar charts 103–104
chi-square test
defined 476
output tables 493
technical details 482
understanding results 478–481
CHISQ option, TABLES statement (FREQ)
478, 492
CL option, TABLES statement (FREQ) 492
Class Level table 273, 280
CLASS statement
ANOVA procedure 271–272, 279
MEANS procedure 219–221, 261
NPAR1WAY procedure 281, 283
TTEST procedure 235
UNIVARIATE procedure 221–225, 261
classification variables
creating measures of association
482–491
summarizing data in tables 467–476
testing for independence 476–482
CLDIFF option, MEANS statement
(ANOVA) 297–298
CLI option
MODEL statement, REG procedure 352, 367
PRINT statement, REG procedure 352
CLINE= option, PLOT statement (REG) 343
CLM option
MODEL statement, REG procedure 352, 367
PRINT statement, REG procedure 352
CLM statistics keyword 183
COLOR= option
HISTOGRAM statement (UNIVARIATE) 143
SYMBOL statements 359
columns
See variables
command bar 520
COMMAw.d format 48
comparisonwise error rate 284
CONF option, PLOT statement (REG) 359,
373
confidence intervals
changing 183–184
changing alpha levels 184, 242–243
defined 180
for the mean 351
multiple comparison procedures 297
confidence limits
defining 350
defining for mean 351, 357–358
measures of association and 490–491
on individual values 350
Confidence Limits table
comparing independent groups 236
reviewing 208–210, 239–240
CONTAINS operator 155
contingency tables
bar charts for 499
creating for multiple variables 475–476
creating from summary tables 473–475
creating measures of association
482–491
defined 467
ODS Statistical Graphics 499–504
summarizing analyses 492–493
summarizing raw data in 468–473
tests for independence 476–482
continuous scale 75
continuous variables
See also testing for normality
calculating correlation coefficients
331–338
checking data for errors 329
creating histograms for 90–92
creating line printer plots 87–90
creating scatter plot matrix 326–329
creating scatter plots 323–325
defined 74
error checking in 112–113
fitting a straight line 343–360
fitting curves 360–375
Kruskal-Wallis test 280
measuring 75
multiple regression for 375–380
ODS Statistical Graphics support
325–326
performing straight-line regression
338–343
summarizing 76–86
summarizing multiple 322–331
Cook’s D statistic 428, 460
copying 523
CORR procedure
calculating correlation coefficients
331–338
creating scatter plot matrix 326–329
creating scatter plots 323–325, 363
ELLIPSE=NONE option 324
error checking with 329
general form 330, 337
HISTOGRAM option 326–327
Matrix Plot table 331
missing values and 334–336
NOMISS option 334–336
NVAR=ALL option 326
Pearson Correlation Coefficients table 332–337
PLOTS= option 324, 326
Scatter Plot table 331
Simple Statistics table 329, 331–332, 335, 337
VAR statement 324, 326, 329, 332
Variable Information table 332, 337
CORRECT= option, NPAR1WAY procedure
247
correlation
calculating correlation coefficients
331–337
cautions using 337–338
unanswered questions 338
correlation coefficients
calculating 331–338
finding equations for straight lines 346
linear relationships and 338–339
missing values with 334–336
overview of tests for 333–334
Spearman’s rank correlation coefficient 483–484, 487–488, 493
unanswered questions 338
crosstabulations
See contingency tables
curves
checking lack of fit 438–439
fitting 360–375
fitting with polynomial regression
362–366
fitting with REG procedure 361–367
fitting with residuals plots 422–427
plotting predicted values and limits
371–375
printing predicted values and limits
367–370
understanding results 366–367
cutting 523
CVREF= option, PLOT statement (REG)
428, 456
D
data sets
assigning names 30
creating 194
defined 27
opening 59–60
printing 35–39
sorting 39–41
temporary 59
DATA statement 29, 31, 61
DATA step
adding program statements 192
creating contingency tables 475
creating variables in 194–195
overview 27–29
DATALINES statement
creating variables 194–195
functionality 29, 33–34
INFILE statement and 61
DBMS= option, IMPORT procedure 62–63
DDE (Dynamic Data Exchange) 68
degrees of freedom 181
dependent (response) variables 328, 340
DESCENDING option
BY statement (SORT) 40–41
HBAR statement (CHART) 110–111
HBAR statement (GCHART) 110–111
diagnostics panel of plots 458–462
DISCRETE option, VBAR statement
(GCHART) 107
discrete scale 75
DOLLAR format 46
DOLLAR10.2 format 47
DOLLARw.d format 48
DOUBLE option, PRINT procedure 38
DUNNETT option, MEANS statement
(ANOVA) 295–296, 298
Dunnett’s test 295–296
Dynamic Data Exchange (DDE) 68
E
Editor window 519–522
ELLIPSE=NONE option, CORR procedure
324
Empirical Rule (normal distribution) 137
Central Limit Theorem and 179–180
confidence intervals and 180–181
Enhanced Editor window 519
equal variances
assumption of 235–236
testing for 276–277
Equality of Variances table 235–236,
240–241
error checking
in continuous variables 112–113, 329
in nominal variables 113
in ordinal variables 113
process overview 112
summary statistics support 329
testing regression assumptions 440–444
error rates 284, 288
estimating the mean
confidence intervals for the mean
180–186
distribution of sample averages 175–180
effect of population variance 173–175
effect of sample size 169–172
point estimates 169
Ew format 48
Excel (Microsoft)
importing 62–63
pivot tables in 467
exiting SAS 19
expected cell frequencies 476, 480–482
EXPECTED option, TABLES statement
(FREQ) 478–482, 487, 492
experimentwise error rate 284, 288
Explorer window 519, 525
Extreme Observations table
checking data for errors 113
comparing paired groups 196
reviewing 82, 124
summarizing continuous variables 76, 83–84
Extreme Values table 83–84, 124
F
F test 284
FISHER option, TABLES statement (FREQ)
481, 492
Fisher’s exact test 480–482, 493
Fisher’s Exact Test table 481
Fit Statistics table
ANOVA procedure 273–274, 280
REG procedure 347, 367, 379
FOOTNOTE statement 21
FORMAT procedure 50–51
FORMAT statement
combining labeling/formatting 52–53
formatting values of variables 46–48
formats
combining with labeling 52–53
creating 50–51
user-defined 68
values of variables 46–50
FORMCHAR= system option 393, 456
FRACTw format 48
FREQ option, UNIVARIATE procedure
95–96, 198
FREQ procedure
See also TABLES statement, FREQ procedure
checking for errors in variables 112–113
creating frequency tables 96–98
creating measures of association
484–486
general form 98, 100, 472, 475, 492, 504
Kendall’s Tau-b test table 487–488, 493
Measures of Association table 485–486, 488
missing values in 98–100
ODS Statistical Graphics for 499–504
ODS table names 124, 473, 493
PAGE option 476
PLOTS= option 500–504
Spearman Correlation Coefficient test table 487–488, 493
Statistics for Table of ... table 479–480, 488–491
summarizing analyses 492–493
TEST statement 484, 486, 492–493
WEIGHT statement 474–475, 478, 484
Frequency Counts table 95–96, 124
frequency tables, creating 92–100, 198
functions, formatting variable values 48–49
G
GCHART procedure
checking for errors in variables 113
functionality 101
general form 103, 107–109, 111
GOPTIONS statement 101
HBAR statement 108–111
high-resolution bar charts 499
horizontal bar charts 107–108
PATTERN statement 101
VBAR statement 103, 107
vertical bar charts 102–107
GLM procedure 271
Goodness-of-Fit Tests for Normal Distribution
table 145, 150
GOPTIONS statement, GCHART procedure
101
graphs, saving and printing 526
GROUP= option, VBAR statement (CHART)
227–229
H
HBAR statement
CHART procedure 109–111, 124
GCHART procedure 108–111
HISTOGRAM option, CORR procedure
326–327
HISTOGRAM statement, UNIVARIATE
procedure
checking for normality 143–146
COLORS= option 143
creating histograms 90, 92, 225–227
NOPRINT option 146
NORMAL option 143–146
NOROWS= option 262–264
NROWS option 262–264
rechecking data 151
summarizing data 261
histograms
checking normality of errors 441–443
creating for continuous variables 90–92
creating for multiple groups 262–264
plotting sample averages 170
scatter plot matrix example 327–328
testing for normality 143–146
horizontal bar charts 107–109
HOVTEST option, MEANS statement
(ANOVA) 276, 279
hypothesis of independence 476–482
hypothesis tests
comparing independent groups 231–233
comparing multiple groups 266–269
comparing paired groups 198–201
performing 156–158
testing for normality and 156–157
I
ID statement
REG procedure 351–352
UNIVARIATE procedure 76
IMPORT procedure
DBMS= option 62–63
general form 63
importing spreadsheets 62
OUT= option 62–63
reading data from text files 61
Import Wizard 63–68
importing data
Import Wizard support 63–68
importing spreadsheets 62–63
opening data sets 59–60
reading data from text files 61–62
including programs 524–525
independent groups of data
building hypothesis tests 231–233
deciding which tests to use 199–200, 232
defined 191
steps for analyzing 232–233
summarizing with BOXPLOT procedure 229–231
summarizing with CHART procedure 227–229
summarizing with MEANS procedure 219–221
summarizing with UNIVARIATE procedure 221–227
two-sample t-test 233–243
Wilcoxon Rank Sum test 243–249
independent (regressor) variables
defined 340
naming in MODEL statement (REG) 364
plotting residuals against 404–407, 419
regression for 375–380
INFILE statement
creating variables 195
functionality 34
general form 62
reading data from text files 61
sample statements 61
INPUT statement
functionality 29, 31–32, 61
variable print order 35
intercept of a line 340, 348
interval estimates 180
interval variables 74, 76
IS MISSING operator 155
K
Kendall’s tau-b 483–484, 487–488, 493
Kendall’s Tau-b test table 487–488, 493
KENTB option, TEST statement (FREQ) 492
Kolmogorov-Smirnov test 141
Kruskal-Wallis test
deciding whether to use 268
performing 280–283
summary example 306
Kruskal-Wallis Test table
comparing independent groups 245
reviewing 281, 283
kurtosis 136, 141–142
L
LABEL option, PRINT procedure 45
LABEL statement 45, 52–53
labels
assigning with MODEL statement (REG) 349
combining with formatting 52–53
variables 45–46
lack of fit
checking with multiple regression
436–437
checking with straight-line regression 434–438
fitting a curve and 438–439
overview 433–434
LACKFIT option, MODEL statement (REG)
434–436
least squares regression
assumptions 341–342
defined 339–340
for curves 361
regression equations 340–341
testing assumption for errors 440
levels of measurement 73–76
LIBNAME statement 59–60
libref, defined 60
likelihood ratio chi-square test 480
likelihood ratio test 478
LINE= option, SYMBOL statements 359
line printer plots
creating diagnostic plots 454–457
for continuous variables 87–90
overview 393–398
producing with REG procedure 355
linear regression
See straight-line regression
LINEPRINTER option, REG procedure 393,
398, 453, 456
lines
See straight lines
LINES option, MEANS statement (ANOVA)
297–298
LINESIZE= system option 86, 476
LOG function 49
Log window 519–520
LOWCASE function 49
M
Mann-Whitney U test
See Wilcoxon Rank Sum test
Matrix Plot table 331
MAX statistics keyword 86
MAXDEC= option, MEANS procedure
183–184
mean
See population mean
MEAN statistics keyword 86, 183
MEANS procedure
ALPHA= option 183–184
CLASS statement 219–221, 261
general form 86, 184, 221
getting confidence intervals 181–183
identifying ODS tables 124, 185
MAXDEC= option 183–184
statistics keywords 86, 183
summarizing continuous variables 76, 85–86
summarizing data 219–221
MEANS statement, ANOVA procedure
ALPHA= option 293–294, 298
BON option 289, 298
CLDIFF option 297–298
DUNNETT option 295–296, 298
general form 298
HOVTEST option 276, 279
LINES option 297–298
T option 285–286, 298
TUKEY option 291–294, 298
WELCH option 277–278
measurement, levels of 73–76
measures of association
changing confidence level 490–491
creating 482–486
output tables 493
understanding results 487–490
Measures of Association table 485–486, 488
MEASURES option, TABLES statement
(FREQ) 484, 492
median, defined 136
MEDIAN statistics keyword 86
Microsoft Excel
importing 62–63
pivot tables in 467
MIN statistics keyword 86
missing values
correlation coefficients with 334–336
identifying 32
in CORR procedure 334–336
in FREQ procedure 98–100
Missing Values table
checking data for errors 113
reviewing 82, 124
summarizing continuous variables 76, 86
MISSPRINT option, TABLES statement
(FREQ) 100
mode, defined 136
Model ANOVA table 273, 275, 280
MODEL statement, ANOVA procedure 271,
279, 287
MODEL statement, REG procedure
assigning labels 349
CLI option 352, 367
CLM option 352, 367
fitting multiple regression models
376–377
fitting straight lines 343, 358, 365
LACKFIT option 434–436
naming independent variables in
363–364
P option 352, 367
printing predicted values and limits
351–352
R option 428
residuals plots for straight-line regression 416
Moments table
creating line printer plots 89
reviewing 78–80, 124, 150
summarizing continuous variables
76–77, 81
testing for normality 141–144, 151
MU= option, PROBPLOT statement
(UNIVARIATE) 147
multiple comparison procedures
Bonferroni approach 288–291
changing alpha level 293–295
defined 269, 283–284
Dunnett’s test 295–296
ODS table names 299
performing pairwise comparisons
284–288
recommendations 297
summarizing 297–307
Tukey-Kramer test 291–293
multiple continuous variables, summarizing
322–331
multiple groups
ANOVA with unequal variances
275–278
building hypothesis tests 266–269
comparing with ANOVA procedure 271–279
creating comparative histograms
262–264
creating side-by-side box plots 264–266
multiple comparison procedures 283–307
performing Kruskal-Wallis test 280–283
performing one-way ANOVA 269–275
summarizing data from 259–266
summary example 300–307
multiple regression
checking lack of fit 436–437
creating diagnostic plots 454–456
defined 375
fitting models 376–377
overview 375–376
printing predicted values and limits 379
residuals plots for 411–416
summarizing 379–380
understanding results 378–379
multiple variables, tables summarizing 467
N
N statistics keyword 86, 183
NEXTROBS=0 option, UNIVARIATE
procedure 83
NEXTRVAL= option, UNIVARIATE
procedure 83
NMISS statistics keyword 86
NOCLI option, REG procedure 357
NOFREQ option, TABLES statement (FREQ)
471–472
nominal variables
classifying data with 467
defined 73
error checking in 113
measuring 75
NOMISS option, CORR procedure 334–336
NOMODEL option, PLOT statement (REG)
407, 409
nonparametric tests
deciding whether to use 199–200, 232
defined 138
for multiple groups 268
Kruskal-Wallis test 280–283
Wilcoxon Rank Sum test 243–249
Wilcoxon Signed Rank test 210–211
NOOBS option, PRINT procedure 37
NOPRINT option
HISTOGRAM statement (UNIVARIATE) 146
TABLES statement (FREQ) 490, 492
UNIVARIATE procedure 90, 225,
262–264
normal distribution
Central Limit Theorem and 175–180
checking normality of errors 440–444
confidence intervals and 181
defined 135
Empirical Rule 137
mean and 136
properties 136
standard deviation and 136
Normal Distribution table 144
NORMAL option
HISTOGRAM statement (UNIVARIATE) 143–146
PROBPLOT statement (UNIVARIATE) 147
UNIVARIATE procedure 139
normal probability plots 146–149
normal quantile plots 149
normality, testing for
See testing for normality
NOROWS= option, HISTOGRAM statement
(UNIVARIATE) 262–264
NOSTAT option
HBAR statement (CHART) 109–111
HBAR statement (GCHART) 109–111
PLOT statement (REG) 343, 407–409, 428, 455–456
NPAR1WAY procedure
BY statement 249, 283
CLASS statement 281, 283
CORRECT= option 247
general form 249, 283
Kruskal-Wallis test 281–282
Kruskal-Wallis Test table 245, 281, 283
Scores table 245, 248, 283
VAR statement 281, 283
WILCOXON option 245
Wilcoxon Rank Sum test 245–249
Wilcoxon Two-Sample Test table
245–249
NQQ. statistical keyword 453, 456
NROWS= option, HISTOGRAM statement
(UNIVARIATE) 262–264
null hypothesis 156–157
null statement 34, 61
Number of Observations table
ANOVA procedure 273, 280
REG procedure 349
NVAR=ALL option, CORR procedure 326
O
OBS. statistical keyword 409, 428, 456
observation numbers 35, 37
observations
See also Extreme Observations table
adding blank lines between 38
defined 27
Number of Observations table 273, 280, 349
putting multiple on one line 33
ODS (Output Delivery System) 123–126
ODS GRAPHICS statement
automatic 458–462
creating scatter plots 324
general form 326
plotting predicted values and limits
356–358
ODS SELECT statement 123, 222
ODS statement 324–325
ODS Statistical Graphics 325–326, 405,
499–504
ODS tables
identifying 124, 150, 151, 185, 329
names of 299, 348–349
ODS TRACE statement 125
one-way ANOVA
assumptions 270–271
deciding whether to use 268
defined 267
performing 271–275
understanding results 272–275
OPTIONS statement
FORMCHAR= option 393, 456
LINESIZE= option 86, 476
overview 22–23
VALIDVARNAME= option 30
ordinal variables
classifying data with 467
creating measures of association with 482–491
defined 73
error checking in 113
Kruskal-Wallis test 280
measuring 75
OUT= option, IMPORT procedure 62–63
outliers
defined 89
finding with scatter plot matrix 329
looking for 427–433
when plotting residuals 401–403,
427–433
output, printing and saving 525–526
Output Delivery System (ODS) 123–126
Output Statistics table
fitting curves 365, 368–370
fitting straight lines 352–353
predicting values and limits 380
reviewing 354–355, 428–429
output tables 123–126, 222, 493
See also specific table names
Output window 519–520, 525–526
Overall ANOVA table 273–275, 280
OVERLAY option, PLOT statement (REG)
395–396
P
P option
MODEL statement, REG procedure 352, 367
PRINT statement, REG procedure 352
P. statistical keyword 456
p-value
correlation coefficients displaying 333
defined 139
finding for contingency tables 487
finding with ANOVA procedure 272
finding with NPAR1WAY procedure 246–247, 282
finding with TTEST procedure 207–209, 237–238
finding with UNIVARIATE procedure 205–206, 211
hypothesis testing and 156–157
Shapiro-Wilk test and 140–141
statistical significance and 158–159, 200–201, 233, 268–269
PAGE option, FREQ procedure 476
paired-difference t-test
assumptions 201–203
deciding whether to use 200, 232
performing 201–210
technical details 205
testing with TTEST procedure 206–209
testing with UNIVARIATE procedure 204–206
paired groups of data
building hypothesis tests 198–201
deciding which tests to use 199–200, 232
defined 191
finding differences between 192–195
performing paired-difference t-test
201–210
performing Wilcoxon Signed Rank test 210–211
steps for analyzing 200
summarizing data 192–198
parameter estimates 340, 346–347
Parameter Estimates table
REG procedure 346–347, 365–366, 378
UNIVARIATE procedure 150
parameters (population) 133
Parameters for Normal Distribution table 144
parametric tests
deciding whether to use 199–200, 232
defined 138
for multiple groups 268
performing paired-difference t-test
201–210
performing two-sample t-test 233–243
pasting 523
PATTERN statement, GCHART procedure
101
Pearson correlation coefficients
See correlation coefficients
Pearson Correlation Coefficients table
332–337
Pearson test
See chi-square test
PERCENTw.d format 48
permanent data sets
See data sets
pivot tables
See contingency tables
PLOT option, UNIVARIATE procedure
checking for normality 142–143, 146
line printer plots 87
summarizing data 198, 222, 261
PLOT statement, BOXPLOT procedure
BOXWIDTHSCALE= option 264–266
BWSLEGEND option 264–266
comparing multiple groups 230–231
PLOT statement, REG procedure
CLINE= option 343
CONF option 359, 373
creating diagnostic plots 454–456
CVREF= option 428, 456
line printer plots 393, 456
NOMODEL option 407, 409
NOSTAT option 343, 407–409, 428, 455–456
NQQ. statistical keyword 456
OBS. statistical keyword 409, 428, 456
OVERLAY option 395–396
P. statistical keyword 456
plotting predicted values and limits 355, 358–360
plotting residuals against independent variables 405–406
plotting residuals in time sequence 414, 420–421, 425–426
PRED option 359, 373
R. statistical keyword 406, 409, 456
STUDENT. statistical keyword 428, 456
VREF= option 428, 456
PLOTS=DIAGNOSTICS (STATS=NONE)
option, REG procedure 453, 458,
461
PLOTS= option, CORR procedure 324, 326
PLOTS= option, FREQ procedure 500–504
PLOTS= option, REG procedure
checking normality of errors 441
diagnostics panel of plots 458–462
in regression diagnostics 411, 413–414
plotting predicted values and limits
357–358, 371–373
plotting residuals against independent variables 405, 419, 422–425
plotting residuals against predicted values 413–414, 420
plotting predicted values and limits 355–360,
371–375
plotting residuals
See residuals plots
point estimates 169
polynomial regression
checking assumptions 365
fitting curves 362–367
overview 360–361
performing analysis 365–366
plotting predicted values and limits
371–375
printing predicted values and limits
367–370
summarizing 375
population
defined 131
hypothesis tests and 156–157
normal distribution for 136
parameters for 133
symbols used 133
population mean
See also estimating the mean
confidence intervals for 180–186
deciding which means differ 285–287, 289–290, 293, 296
defined 133
defining confidence limits 351, 357–358
normal distribution and 136
standard deviation and 135
standard error of the mean 179
statistical notation for 133
population variance
effect of 173–175
point estimate and 169
statistical notation for 133
practical significance 160–161
PRED option, PLOT statement (REG) 359,
373
predicted values
plotting 355–360, 371–375
plotting residuals against 401–403,
413–414, 420, 424–425
printing 350–355, 367–370, 379
prediction limits
defined 350
plotting 355–360, 371–375
printing 350–355, 367–370, 379
PRINT procedure
DOUBLE option 38
functionality 35–39
general form 39
LABEL option 45
NOOBS option 37
VAR statement 37
PRINT statement, REG procedure 351–352
printing
data sets 35–39
graphs 526
one table per page 476
output 525–526
predicted values and limits 350–355, 367–370, 379
programs 524–525
selected variables 37
probability plots, normal 146–149
probability, reference
defined 158
p-value considerations 200–201, 233, 268–269
PROBPLOT statement, UNIVARIATE
procedure 147, 151
Program Editor window 519, 522
program statements
adding to DATA steps 192
creating variables 194–195
example 363
technical details 195
programs
See SAS programs
pure error 434
Q
QRANGE statistics keyword 86
quadratic polynomials 360–361
quadratic term 361
quantile plots, normal 149
Quantiles for Normal Distribution table 145,
150
Quantiles table
reviewing 81–82, 124
summarizing continuous variables 76
QUIT statement 350
R
R option, MODEL statement (REG) 428
R. statistical keyword 406, 409, 456
random samples 132–133
RANGE statistics keyword 86
ratio variables 74, 76
reading data from text files 61–62
recalling submitted programs 524
reference probability
defined 158
p-value considerations 200–201, 233, 268–269
REG procedure
See also MODEL statement, REG procedure
See also PLOT statement, REG procedure
See also PLOTS= option, REG procedure
ALPHA= option 369–371
Analysis of Variance table 348–349, 365, 367, 379, 435–439
Fit Statistics table 347, 367, 379
fitting curves 361–367
fitting straight lines 343–360
general form 398
ID statement 351–352
interactive usage 349–350
LINEPRINTER option 393, 456
NOCLI option 357
Number of Observations table 349
Output Statistics table 352–355, 365, 368–370, 380, 428–429
Parameter Estimates table 346–347, 365–366, 378
plotting predicted values and limits
355–360, 371–375
PRINT statement 351–352
printing predicted values and limits
350–355, 367–370, 379
QUIT statement and 350
regression diagnostic support 405
RUN statement and 349–350
studentized residuals 427
VAR statement 406
regression
See also least squares regression
See also multiple regression
See also straight-line regression
checking assumptions 365
fitting curves 360–375
fitting straight lines 343–360
for multiple independent variables
375–380
polynomial 360–375
regression diagnostics
creating diagnostic plots 454–457
investigating lack of fit 433–439
outliers in data 427–433
plotting residuals 401–404
residual plots for fitting curves 422–427
residual plots for multiple regression 411–416
residual plots for straight-line regression 405–411, 416–421
testing assumptions for errors 440–444
regression equations
defined 340–341
finding for curves 360–361
finding for multiple regression 378
finding for straight lines 345–346
for quadratic polynomials 360–361
regressor variables
See independent (regressor) variables
residuals
creating plots for data 404–427
defined 355, 401
plotting 401–404
R statistics keyword 406
standardized 427
studentized 427
residuals plots
against independent variables 403,
405–407, 419
against predicted values 401–403,
413–414, 420, 424–425
checking normality of errors 440–444
data without outliers 431–433
defined 401
diagnostics panel of 458–462
fitting curves 422–427
for straight-line regression 405–411, 416–421
from data 404–427
in time sequence 403–404, 409–410, 414–415, 420–421
looking for outliers in data 427–433
response scales 75–76
response (dependent) variables 328, 340
Results window 519, 525–526
ROUND function 49
rows
See observations
RSTUDENT. statistical keyword 459, 460,
462
RUN statement
functionality 29, 34
general form 22
REG procedure and 349–350
S
sample
defined 131
statistics for 133
symbols used 133
sample average
as point estimate 169
defined 133
distribution of 175–180
plotting on histogram 170
statistical notation for 133
sample size
effect of 169–172
point estimate and 169
reducing 171–172
sample variance
statistical notation for 133
technical details 134
SAS
exiting 19
starting 15
SAS data sets
See data sets
SAS Enterprise Guide 527–529
SAS/GRAPH software 325n, 358, 405n
SAS programs
creating 520–522
recalling submitted 524
saving, including, printing 524–525
submitting 523–524
SAS windowing environment
command bar 520
copying, cutting, pasting 523
creating SAS programs 520–522
Editor window 519–522
Enhanced Editor window 519
Explorer window 519, 525
Log window 519–520
Output window 519–520, 525–526
printing and saving output 525–526
Program Editor window 519, 522
recalling submitted programs 524
Results window 519, 525–526
saving, including, printing programs 524–525
saving and printing graphs 526
saving tables 526
submitting programs 523–524
toolbar 520
viewing initial windows 519–520
saving
graphs 526
output 525–526
programs 524–525
tables 526
scatter plot matrix
benefits 328–329
creating 326–329
defined 326
finding outlier points 329
sample structure 328
Scatter Plot table 331
scatter plots
creating 323–325, 358–359, 363
defined 322
response variables 328
scatter plot matrix example 327
Scores table
comparing independent groups 245
reviewing 248, 283
SCORR option, TEST statement (FREQ) 492
Shapiro-Wilk test 140–141, 145
side-by-side bar charts 227–229, 261
side-by-side box plots 229–231, 261,
264–266
SIGMA= option, PROBPLOT statement
(UNIVARIATE) 147
significance, statistical
See statistical significance
simple random sample 132
Simple Statistics table
calculating correlation coefficients 332, 335
checking data for errors 329–330
reviewing 331, 337
skewness, testing for normality 141–142
slope of a line 340, 348
SORT procedure 39–41
sorting data sets 39–41
Spearman Correlation Coefficient test table
487–488, 493
Spearman’s rank correlation coefficient
483–484, 487–488, 493
spreadsheets, importing 62–63
SS (sum of squares) 348
standard deviation
defined 133
estimating the mean and 173–175
mean and 135
normal distribution and 136
standard error of the mean 179
statistical notation for 133
standard error of the mean 179
standardized residuals 427
starting SAS 15
statistical notation 133
statistical significance
choosing significance level 159
correlation coefficients and 333–334
defined 158
example 161
p-value and 158–159, 200–201, 233, 268–269
statistical tests
See also specific tests
deciding which to use 199–200,
232–233
understanding significance 200–201, 233
statistics (sample) 133
Statistics for Table of ... table 479–480,
488–491
Statistics table
comparing independent groups 236
reviewing 208–210, 239–240
STDDEV statistics keyword 86, 183
stem-and-leaf plots 88–89, 142
STNAMEL function 48–49
storing temporary data sets 59
straight-line regression
checking lack of fit 434–438
finding equation for 345–346
fitting with REG procedure 343–360
least squares regression 339–342
performing 338–342
residuals plots for 405–411, 416–421
steps for performing 342
summarizing 360
straight lines
finding equations for 345–346
fitting with PLOT statement 358–359
fitting with REG procedure 343–360
intercept of 340, 348
slope of 340, 348
stratified random sample 132–133
STUDENT. statistical keyword 428, 456
studentized residuals 427
Student’s t-test
See paired-difference t-test
submitting programs 523–524
sum of squares (SS) 348
summarizing data
checking data for errors 112–113
creating bar charts 101–111
creating frequency tables 92–100
creating histograms 90–92
creating line printer plots 87–90
levels of measurement 73–76
summarizing continuous variables 76–86
types of response scales 75–76
types of variables 73–76
summary statistics
See also specific types of statistics
checking data for errors 329
identifying ODS tables 124, 151, 185, 329
summary tables
See contingency tables
SUMVAR= option
HBAR statement (CHART) 110–111
HBAR statement (GCHART) 110–111
SYMBOL statements 358–360, 373–375,
455
COLOR= option 359
LINE= option 359
VALUE= option 373–374
T
T option, MEANS statement (ANOVA)
285–286, 298
t-tests
See specific t-tests
T Tests table
comparing independent groups 236
comparing paired groups 207
reviewing 210, 237–240
t-value 181
Table of... table 470–471
tables
See also contingency tables
frequency tables 92–100, 198
identifying ODS tables 124, 150, 151,
185, 329
output tables 123–126, 222
printing one per page 476
saving 526
summarizing data in 467–476
three-way 467
two-way 467
TABLES statement, FREQ procedure
ALPHA= option 490, 492
CHISQ option 478, 492
CL option 492
creating contingency tables 469–470, 475
creating frequency tables 98
creating measures of association
484–485
EXPECTED option 478–482, 487, 492
FISHER option 481, 492
handling missing values 99–100
MEASURES option 484, 492
MISSPRINT option 100
NOFREQ option 471–472
NOPRINT option 490, 492
temporary data sets, storing 59
TEST statement, FREQ procedure 484, 486,
492–493
test statistic 138, 156–157
testing errors assumption 440–444
testing for normality
defined 138
identifying ODS tables 150
other methods of 141–149
rechecking data 151–155
statistical test for normality 138–141
tests for independence 476–482
Tests for Location table
comparing paired groups 204–206, 211
reviewing 83, 210
summarizing continuous variables 76
Tests for Normality table
reviewing 140, 150
testing for normality 151
text files, reading data from 61–62
three-way tables 467
time sequence, plotting residuals in
fitting curves 425–426
in multiple regression 414–415
in straight-line regression 409–410,
420–421
overview 403–404
TITLE statement 20–21, 35
toolbar 520
TTEST procedure
ALPHA= option 242–243
BY statement 241
CLASS statement 235
Confidence Limits table 208–210, 236, 239–240
Equality of Variances table 235–236, 240–241
general form 209, 241, 243
performing paired-difference t-test 203, 206–209
performing two-sample t-test 234–242
Statistics table 208–210, 236, 239–240
T Tests table 207, 210, 236–240
VAR statement 235
Tukey-Kramer test 288, 291–293, 307
TUKEY option, MEANS statement
(ANOVA) 291–294, 298
two-sample t-test
assumptions 234
deciding whether to use 200, 232
performing 233–243
technical details 242
testing for equal variances 235–236
testing with TTEST procedure 234–242
two-way tables 467
2×2 table 467
Type I error 158
Type II error 159–160
U
unbalanced data 268, 271
UNIVARIATE procedure
See also HISTOGRAM statement, UNIVARIATE procedure
See also PLOT option, UNIVARIATE procedure
Basic Statistical Measures table 76,
80–82, 113, 124, 196, 222
CLASS statement 221–225, 261
creating comparative histograms
262–264
creating frequency tables 93–96
error checking with 112–113, 329
Extreme Observations table 76, 82–84, 113, 124, 196
Extreme Values table 83–84, 124
FREQ option 95–96, 198
Frequency Counts table 95–96, 124
general form 83–84, 90, 92, 96, 141, 146, 148, 206, 225, 227, 264
Goodness-of-Fit Tests for Normal Distribution table 145, 150
ID statement 76
Missing Values table 76, 82, 86, 113, 124
Moments table 76–81, 89, 124, 141–144, 150–151
NEXTROBS=0 option 83
NEXTRVAL= option 83
NOPRINT option 90, 225, 262–264
Normal Distribution table 144
NORMAL option 139
paired-difference t-test 202–206
Parameter Estimates table 150
Parameters for Normal Distribution table 144
plotting histogram of sample averages 170
PROBPLOT statement 147, 151
Quantiles for Normal Distribution table 145, 150
Quantiles table 76, 81–82, 124
scatter plot matrix comparison 328–329
summarizing continuous variables 76–78
summarizing data 221–227
summarizing differences 196–198
tests for independence 477
Tests for Location table 76, 83, 204–206, 210–211
Tests for Normality table 140, 150–151
VAR statement 76
Wilcoxon Rank Sum test 244–245
Wilcoxon Signed Rank test 211
UPCASE function 49
user-defined formats 68
V
VALIDVARNAME= system option 30
VALUE= option, SYMBOL statements
373–374
VALUE statement, FORMAT procedure
50–51
VAR statement
CORR procedure 324, 326, 329, 332
NPAR1WAY procedure 281, 283
PRINT procedure 37
REG procedure 406
TTEST procedure 235
UNIVARIATE procedure 76
Variable Information table 332, 337
variables
See also classification variables
See also continuous variables
See also independent (regressor) variables
See also ordinal variables
creating bar charts for 101–111
creating contingency tables for 475–476
creating frequency tables for 92–100
creating with program statements
194–195
defined 27
dependent (response) variables 328, 340
formatting values 46–51
interval variables 74, 76
labeling 45–46
measures of association 482–486
nominal variables 73, 75, 113, 467
omitting column location 32–33
print order 35
printing selected 37
ratio variables 74, 76
summarizing multiple continuous
322–331
tables summarizing multiple 467
variance
See also population variance
ANOVA with unequal variance 275–278
assumption of equal variances 235–236
defined 133
sample variance 133, 134
statistical notation for 133
testing for equal variances 276–277
VBAR statement
CHART procedure 104, 107, 124,
227–229
GCHART procedure 103, 107
vertical bar charts
CHART procedure 103–104
GCHART procedure 102–107
VREF= option, PLOT statement (REG) 428,
456
W
WEIGHT statement, FREQ procedure
474–475, 478, 484
Welch ANOVA 277–278
WELCH option, MEANS statement
(ANOVA) 277–278
WHERE statement
functionality 154–155
general form 155
rechecking data 151
whiskers, defined 89
WILCOXON option, NPAR1WAY procedure
245
Wilcoxon Rank Sum test
deciding whether to use 200, 232
performing 243–249
testing with NPAR1WAY procedure 245–249
testing with UNIVARIATE procedure 244–245
Wilcoxon Signed Rank test
deciding whether to use 200, 232
performing 210–211
Wilcoxon Two-Sample Test table 245–249
windowing environment
See SAS windowing environment
Work library 59
X
x-axis, and scatter plots 322
x variables
See independent (regressor) variables
Y
y-axis
residuals plots 401
response variables 328
scatter plots and 322