Index
A
acceptance-rejection technique 126–128
addition (+) operator 329
Akaike information criterion 249
ALL function 326
alternative parameterizations 216, 218
annotation facility, SGPLOT procedure 301
ANY function 326
approximate sampling distribution
See ASD (approximate sampling distribution)
AR(1) model
approximating sampling distributions for parameters 254–256
generating covariance matrix with known structure 183
generating multivariate binary variates 155
simulating data in DATA step 252–254
simulating data in SAS/IML software 258–260
ARIMA procedure
about 252
BY statement 254
ESTIMATE statement 254
estimating AR and MA model parameters 258
ARMA models
about 252
approximating sampling distributions for AR(1) parameters 254–256
simple AR model 252
simulating AR and MA data in DATA step 256–258
simulating AR(1) data in DATA step 252–254
simulating AR(1) data in SAS/IML software 258–260
ARMACOV function 258
ARMALIK function 258
ARRAY statement 284
arrays, holding explanatory variables 201
ASD (approximate sampling distribution)
number of samples and 96
sampling distribution of Pearson correlations 70–71
sampling distribution of statistics for normal data 60–61
sampling distribution of the mean 57–59, 68
simple regression model example 202–203
at (@) symbol 233
autocorrelated data 251
autoregressive and moving average models
See ARMA models
autoregressive model 252, 256–258
B
BARCHART statement, TEMPLATE procedure 38–39
BarPMF template 39
baseline hazard function 242–243
BC (bias-corrected) confidence intervals 295
Bernouilli distribution
logistic regression and 226–227
parameters for 27
simulating data from inhomogeneous Poisson process 275
bias-corrected (BC) confidence intervals 295
binomial distribution
parameters for 27
BINOMIAL option, TABLES statement (FREQ) 81, 86
BISECTION module 118, 156, 334–335
block-diagonal matrices 232–233
%BOOT macro 295
%BOOTCI macro 295
bootstrap confidence intervals 283, 295
bootstrap distribution
computing standard deviation of 294–295
for skewness and kurtosis 285–289
bootstrap methods
about 281
computing bootstrap confidence intervals 283, 295
computing bootstrap standard error 294–295
plotting estimates of standard errors 305–306
resampling with DATA step 262–266
resampling with SAS/IML software 282, 288–291
resampling with SURVEYSELECT procedure 282, 286–288
bootstrap standard error 283, 294–295, 305–306
BY-group technique
about 55
computing p-values 90
macro usage considerations 101
resampling example 285
suppressing output and graphics 97–100
writing efficient simulations 96–97, 99
BY statement
ARIMA procedure 254
BY-group technique and 55
MEANS procedure 64
simulating data with DATA step and procedures 56
writing efficient simulations 97
C
CALIS procedure 180
case resampling 284
case sensitivity 6
Cauchy distribution 28
CDF function
checking correctness of simulated data 35
parameter considerations 110
working with statistical distributions 30–32
censored observations 124–125, 244
central limit theorem (CLT) 57–58
chi-square distribution 28, 62–63
chi-square statistic 89
CHISQ option, TABLES statement (FREQ) 90
Cholesky transformation 146–150
CHOOSE function 326
CL option, MIXED procedure 235
CLASS statement
LOGISTIC procedure 218
MEANS procedure 64
classification variables
explanatory variables and 199
linear regression models with 208–211
CLB option, MODEL statement (REG) 221
CLOSE statement 331
CLT (central limit theorem) 57–58
coefficient of excess 299
COLNAME= option
FROM clause, APPEND statement 331
FROM clause, CREATE statement 331
PRINT statement 328
colon (:) operator
See mean (:) operator
comparison (<=) operator 6
complete spatial randomness 273
components (subpopulations) 119–121
compound symmetry model 184
conditional distribution technique 145
conditional distributions 142–144
conditional simulations
about 264
of one-dimensional data 270–271
of two-dimensional data 272–273
CONDMV function 270
CONDMVN function 270
CONDMVNMEANCOV function 144–145
confidence intervals
about 74
bias-corrected 295
computing coverage in SAS/IML language 77–78
coverage for nonnormal data 76–77
coverage for normal data 74–76
MEANS procedure computing 315
CONSTANT function 178
contaminated normal distribution
CONTENTS procedure 46
continuous distributions
CDF function and 31
exponential distribution 22–24, 28
moment-ratio diagram for 300–301
parameters for 28
PDF function and 30
simulating in SAS/IML software 26–27
skewness and kurtosis for 300–301
continuous mixture distribution 122
continuous variables
explanatory variables and 199
linear regression models with 200–203, 210–211
COORDINATES statement, SIM2D procedure 272
COPULA procedure
SIMULATE statement 170
copula technique
fitting and simulating data 169–173
CORR procedure
computing sample moments 319
COV option 180
estimating covariance matrix from data 180
fitting and simulating data from copula model 171, 173
generating data from copulas 168
POLYCHORIC option 190
simple linear regression models 203
simulating data from multinomial distributions 135–140
correlated random errors 232–236
correlation matrices
converting between covariance and 176–177
COUNTN function 326
COV option, CORR procedure 180
covariance matrices
converting between correlation and 176–177
generating diagonally dominant 181–182
generating from Wishart distribution 186–187
generating with known structure 183–186
with compound symmetry 184
with diagonal structure 183–184
with Toeplitz structure 185
CREATE statement
FROM clause 331
row-major order for matrices 260
VAR clause 331
cumulative distributions
See CDF function
CUPROD function 326
CUSUM function 326
CUTVAL. format 60
D
data sets
creating from ODS tables 45–46, 57
creating matrices from 330
macro usage considerations 101
data simulation
See simulating data
DATA step
See also specific techniques
observations and 6
SAS/IML language comparison 6
simulating AR(1) data in 252–254
densities, computing
See also PDF function
finite mixture distribution and 120
overlaying theoretical density on histograms 40–41
overlaying theoretical PMF on frequency plots 38
design matrices
creating for fixed and random effects 238–239
for alternative parameterizations 218
with GLM parameterization 216–218
design of simulation studies
disadvantages of simulation 105
effect of number of samples 95–96
moment-ratio diagram as tool for 315–317
writing efficient simulations 96–105
DIAG function 326
DIAGONAL= option, MATRIX statement (SGSCATTER) 291
diagonally dominant covariance matrices 181–184
discrete distributions
about 14
Bernouilli distribution 14–15, 27
binomial distribution 15–16, 27
CDF function and 31
discrete uniform distribution 17–18
geometric distribution 16–17, 27
parameters for 27
PDF function and 30
Poisson distribution 19–20, 27
simulating in SAS/IML software 24–25
discrete uniform distribution 17–18
dispersion constant 226
DISTANCE function
simulating data from Gaussian random field 265
simulating data from regular process 277
DO function 326
DO loops
as simulation loops 55
effect of sample size on sampling distribution 64
matrix arithmetic versus 155
sampling distribution of Pearson correlations 70
simulating fixed effect by reversing order of 202–203
tips for shortening simulation times 104
using multivariate data 55, 97
using univariate data 13–14, 21–22, 24
writing efficient simulations 97
DYNAMIC statement, SGRENDER procedure 39, 41
E
effect parameterization 218
eigenvalue decomposition 150–151
eigenvalues (spectrum) 187–189
elliptical distributions 169
Emrich-Piedmonte algorithm 154, 158
equality (=) operator 6
Erlang distribution 28
ESTIMATE statement, ARIMA procedure
NOPRINT option 254
OUTEST= option 254
WHERE clause 254
EUCLIDEANDISTANCE module 265, 277, 333–334
evaluating power of t test 84–86
evaluating statistical techniques
assessing two-sample t test for equality of means 78–84
confidence interval for a mean 74–78
effect of sample size on power of t test 87–88
evaluating power of t test 84–86
using simulation to compute p-values 88–90
excess kurtosis
about 299
for continuous distributions 300–301
EXP function 326
EXPAND2GRID function 315
explanatory variables
arrays holding 201
classification variables and 199
continuous variables and 199
exponential distribution
confidence interval for a mean 76–77
goodness-of-fit tests 117
inverse transformation algorithm and 117–118
plotting PDF of 31
proportional hazards model and 242
shape parameters and 301
F
factor pattern matrix 133, 150–151
feasible region 300
final weighted least squares (FWLS) estimate 220–221
FINISH statement 326
finite mixture distributions
contaminated normal distribution 121–122
FISHER option, CORR procedure 168, 173
Fisher's z transformation 290
FITFLEISHMAN module 312
fixed effects
creating design matrices for 238–239
generating variables for 201
simulating by reversing order of DO loops 202–203
simulating random effects components 236–242
simulating with arrays 201
Fleishman's method 115, 298, 311–314
FORM= option, SIMULATE statement (SIM2D) 267–268
FORMAT= option, PRINT statement 328
FORMAT procedure 60
FREE statement 327
FREQ procedure
BY-group processing 90
chi-square tests and 89
confidence interval for a mean 75
design of simulation studies and 315
simulating multivariate ordinal variates 161
simulating univariate data 15, 17–18
TABLES statement 56, 81, 86, 90
usage examples involving tables 44–45
FROM clause
APPEND statement 331
CREATE statement 331
FROOT function
finding intermediate correlations 156
functions
See also specific functions
parameters and 110
SAS/IML language supported 326–328
SAS/IML modules replicating 333–336
FWLS (final weighted least squares) estimate 220–221
FWLS option, ROBUSTREG procedure 220–221
G
GAM procedure 247
GAMINV function 32
gamma distribution
checking correctness of simulated data 35–37
chi-square distribution and 62–63
shape parameters and 301
GAUSS functions 188
Gaussian random field
conditional simulation of one-dimensional data 270–271
conditional simulation of two-dimensional data 272–273
unconditional simulation of one-dimensional data 264–267
unconditional simulation of two-dimensional data 267–269
generalized Pareto distribution 113
GENMOD procedure 229
geometric distribution
drawing random sample from 37–39
parameters for 27
Givens rotations 187
GLIMMIX procedure
estimating covariance matrix 181
reading design matrices 239
simulating random effects components 238
GLM procedure
design matrices with GLM parameterization 217
OUTSTAT= option 97
simple linear regression models and 199, 211
GLMMOD procedure
creating design matrices for fixed and random effects 238–239
GLM parameterization and 216–217
simulating random effects components 238
goodness-of-fit tests 35–37, 117
Graph Template Language (GTL)
defining contour plots 268
overlaying theoretical density on histograms 40–41
overlaying theoretical PMF on frequency plots 37–39
Grid Manager, SAS 102
grid of values, creating 332–333
GRID statement, SIM2D procedure 267
GTL (Graph Template Language)
defining contour plots 268
overlaying theoretical density on histograms 40–41
overlaying theoretical PMF on frequency plots 37–39
Gumbel distribution 111–112, 301, 305
H
hazard function 242
hazard rate 123
high-leverage points 219, 221–224
HISTOGRAM statement, UNIVARIATE procedure
overlaying theoretical density on histograms 40
plotting bootstrap estimates of standard errors 306
sampling distribution for AR(1) parameters 255
sampling distribution of the variance 62
HistPDF template 41
homogeneous Poisson process
about 273
HOMOGPOISSONPROCESS function 277
hypergeometric distribution 27
hypothesis testing, computing p-values for 32, 88–90
I
I function 327
IDENTIFY statement, ARIMA procedure
NOPRINT option 254
VAR= option 253
IF-THEN/ELSE control statement 326
Iman-Conover method 161–164, 176
IML (interactive matrix language) 5
See also SAS/IML language
IML procedure
Cholesky transformation and 148
DATA step function support 227
estimating covariance matrix from data 180
matrix multiplication and 217
multivariate normal distributions and 133
sampling distribution of Pearson correlations 70
simulating ARMA samples 259
simulating Gaussian random fields 267
simulating univariate data 24
t tests and 84
index of maximum (<:>) operator 329
index of minimum (>:<) operator 329
inequality (^=) operator 6
inhomogeneous Poisson process 273, 275–276
INSET statement
SGPLOT procedure 90
UNIVARIATE procedure 306
instrumental distribution 126–128
interactive matrix language (IML) 5
See also SAS/IML language
intermediate correlation 155–156, 167
INTO clause, READ statement 330
INV function 327
inverse CDF function
See QUANTILE function
inverse Gaussian distribution 28, 112
inverse transformation algorithm 117–119
iterative DO statement 326
J
J function
about 327
sampling distribution of the mean 68
simulating univariate data 24
writing efficient simulations 97
jackknife methods 287
jitter technique 132
Johnson system of distributions 114–116, 302, 308–311
K
KDE (kernel density estimate) 120–121
KEEP statement 284
kernel density estimate (KDE) 120–121
Kronecker product matrix operator 233
kurtosis
checking correctness of simulated data 36
design of simulation studies and 315–316
estimate bias in small samples 65–67
Fleishman distribution and 115, 311
for gamma distribution 306–308
Johnson system of distributions 310–311
moment matching and 303
plotting variations on moment-ratio diagram 303–306
KURTOSIS module 288
KURTOSIS= option, OUTPUT statement (MEANS) 66
L
LABEL= option, PRINT statement 328
Laplace distribution 28
LCLM= option, OUTPUT statement (MEANS) 74–75
least trimmed squares (LTS) estimate 220–223
LEVERAGE option, MODEL statement (ROBUSTREG) 222
LEVERAGE= option, OUTPUT statement (ROBUSTREG) 222
LIFETEST procedure 123–124, 245–246
linear mixed models
repeated measures model with random effect 231–232
simulating correlated random errors 232–236
with random effects 226, 230–232
linear predictor
about 226
in generalized linear models 226
in proportional hazards model 243
linear regression models
about 199
with classification and continuous variables 210–211
with interaction and polynomial effects 215–218
with single classification variable 208–210
with single continuous variable 200–203
LINEPARM statement, SGPLOT procedure 219–220
link functions 226
listwise deletion 190
LOAD statement, IML procedure 159, 188, 327
LOESS procedure
about 247
MODEL statement 249
logistic distribution 28
LOGISTIC procedure
alternative parameterizations 216, 218
CLASS statement 218
logistic regression example 228
OUTDESIGN= option 218
OUTDESIGNONLY option 218
logistic regression model 226–229
lognormal distribution
plotting bootstrap estimates of standard errors 305
shape parameters and 301
LTS (least trimmed squares) estimate 220–223
M
machine epsilon 178
machine precision 178
macros
Matérn model II 277
MATLAB functions 188
matrices
See also correlation matrices
See also covariance matrices
checking if symmetric 178
constructing 240
creating data sets from 331
creating from data sets 330
efficiency of 6
reshaping 69
row-major order for 260
SAS/IML language and 6
subscript reduction operators for 328–329
tips for shortening simulation times 103
MATRIX statement, SGSCATTER procedure 291
maximum likelihood estimate
checking correctness of simulated data 36
fitting gamma distribution to data 306
suppressing notes to SAS log 99–100
maximum (<>) operator 329
MCD subroutine 140
MCMC procedure
about 9
Gibbs sampling and 145
parameter considerations 110
mean
assessing two-sample t test 78–84
confidence interval for 74–78, 295
sampling distribution of 57–59, 68–69
MEAN function
computing confidence interval for a mean 77
subscript reduction operator equivalent 329
writing efficient simulations 97
mean square error 54
MEAN statement, SIM2D procedure 267–268
MEANS procedure
approximating sampling distribution 55
BY statement 64
CLASS statement 64
computing point estimates 282
computing sample kurtosis 66
computing sample moments 319
computing variances 61
design of simulation studies and 315
displaying descriptive statistics 255
OUTPUT statement 56, 66, 74–75
sampling distribution of the mean 57–59
unconditional simulation of one-dimensional data 266–267
VARDEF= option 295
median, computing variances of 61–62
Mersenne-Twister algorithm 32–33
METHOD= option
SURVEYSELECT procedure 287
minimum (><) operator 329
mixed models
See linear mixed models
MIXED procedure
CL option 235
covariance structures supported 183
estimating covariance matrices 181
repeated measures model with random effect 231–232
mixing probabilities 120
mixture distributions
contaminated normal distribution 121–122
MODEL procedure
parametric bootstrap method 291
simulating data from copula model 169
MODEL statement
LOESS procedure 249
REG procedure 221
ROBUSTREG procedure 222
moment matching
as tool for designing simulation studies 315–317
moment-ratio diagram
as tool for designing simulation studies 315–317
comparing simulations and choosing models 314
extensions to multivariate data 318–331
fitting gamma distribution to data 306–308
for continuous distributions 300–301
Johnson system of distributions 308–311
plotting variation of skewness and kurtosis on 303–306
MOMENTS module 312
moments of a distribution 299–301
Monte Carlo estimates
about 54
bias of kurtosis estimates in small samples 67
effect of sample size on sampling distribution 63–64
MCMC procedure and 9
number of samples and 96
sampling distribution of the mean 58
simple regression model example 202–203
Monte Carlo standard error 96
multinomial distribution
generating random samples from 89
tabulated distributions and 19, 130
multiplication (#) operator 215, 329
multivariate ARMA models 260–261
multivariate contaminated normal distribution 138–140
multivariate distributions
See also multinomial distribution
See also MVN (multivariate normal distributions)
advanced techniques for simulating data 153–174
basic technique for simulating data 129–152
Cholesky transformation and 146–150
constructing with Fleishman distribution 115
generating data from 137
generating data from copulas 164–173
generating multivariate binary variates 154–157
generating multivariate ordinal variates 158–161
methods for generating data from 144–146
reordering multivariate data 161–164
resampling with SAS/IML software 289–291
simulating data from 129, 153–154
simulating data in time series 251
simulating data with given moments 298
spectral decomposition and 150–151
multivariate normal distributions (MVN)
about 133
estimating covariance matrix from data 180
simulating in SAS/IML software 133–136
simulating in SAS/STAT software 136
MVN (multivariate normal distributions)
about 133
estimating covariance matrix from data 180
simulating in SAS/IML software 133–136
simulating in SAS/STAT software 136
MYSQRVECH function 335
N
naive bootstrap
See bootstrap methods
NARROW option, SIM2D procedure 267
NCOL function 327
nearest correlation matrix 191–193
negative binomial distribution 27, 39
NLIN procedure 291
NOMISS option, CORR procedure 180, 190
NONOTES system option 101, 116
nonsingular parameterizations 218
NOPRINT option
ESTIMATE statement, ARIMA procedure 254
IDENTIFY statement, ARIMA procedure 254
normal distribution
computing p-values 32
computing quantiles for 156
confidence interval for a mean 74–77
parameters for 28
shape parameters and 301
simulating data from 12–13, 21–22
normal mixture distribution 28
notes, suppressing to SAS log 99–100
NOTES system option 101
NROW function 327
number of samples (repetitions) 95–96
NUMREAL= option, SIMULATE statement (SIM2D) 267
O
observations
correlating 181
DATA step and 6
high-leverage points 219
ODS EXCLUDE ALL statement 97
ODS GRAPHICS statement 46
ODS OUTPUT statement
creating data sets from tables 45–46, 57
usage example 80
ODS statements, controlling output with 44–46, 97–99
ODS TRACE statement 44
%ODSOFF macro 80, 98, 228, 236
%ODSON macro 99
OF operator 19
OLS (ordinary least squares) 200
one-dimensional data
conditional simulation of 270–271
unconditional simulation of 264–267
OUT= option
OUTPUT statement, FREQ procedure 97
TABLES statement, FREQ procedure 56
OUTDESIGN= option
LOGISTIC procedure 218
OUTDESIGNONLY option, LOGISTIC procedure 218
OUTEST= option
ESTIMATE statement, ARIMA procedure 254
OUTHITS option, SURVEYSELECT procedure 287–288
OUTP= option, CORR procedure 97, 180
output, controlling with ODS statements 44–46, 97–99
OUTPUT statement
REG procedure 205
ROBUSTREG procedure 222
OUTSTAT= option, GLM procedure 97
P
P= option, OUTPUT statement, REG procedure 205
P5 option, MEANS procedure 58, 285
P95 option, MEANS procedure 58, 285
p-values, computing for hypothesis testing 32, 88–90
pairwise correlations 190
PARAM= option, CLASS statement (LOGISTIC) 218
parameter estimates
parameters
for Bernouilli distribution 27
for binomial distribution 27
for continuous distributions 28
for discrete distributions 27
for Emrich-Piedmonte algorithm 155
for exponential distribution 28, 110
for gamma distribution 28, 110
for geometric distribution 27
for logistic distribution 28
for lognormal distribution 28, 111
for normal distribution 28
for Poisson distribution 27
for standard normal distribution 28
for tabulated distributions 27
for uniform distribution 28, 111
for univariate distributions 109–111
for Weibull distribution 28
rate 22
using parameter estimates as 207–208
parametric bootstrap method 281, 291
Pareto distribution 28, 112–113
PD (positive definite)
about 177
generating covariance or correlation matrix 179–180
generating diagonally dominant covariance matrix 181–182
problems with covariance matrices 190
testing covariance matrices 177
PDF function
about 326
checking correctness of simulated data 35
finite mixture distribution and 120
overlaying theoretical density on histograms 40–41
overlaying theoretical PMF on frequency plots 38
parameter considerations 110
simulating data from continuous distributions 21, 23
working with statistical distributions 30–31
Pearson correlations
bootstrap resampling 290
correlation matrices and 176
sampling distribution of 69–71
simple regression model example 202–203
Pearson system of distributions 302
PHREG procedure 244
PLCORR option, OUTPUT statement (FREQ) 190
PLOTS= option, SIM2D procedure 268
PMF function
checking correctness of simulated data 35
generating multivariate ordinal variates 158–161
overlaying on frequency plot 37–39
working with statistical distributions 30–31
POINT= option, SET statement 205, 282–284
Poisson distribution 19–20, 27, 229
Poisson process
about 273
Poisson regression model 226, 229–230
%POLYCHOR macro 190
POLYCHORIC option, CORR procedure 190
polynomial effects, linear models 215–218
POLYROOT function 327
pooled variance t test
about 78
assessing in SAS/IML software 83–84
effect of sample size on power of 87–88
robustness to nonnormal populations 81–82
robustness to unequal variances 78–81
positive definite (PD)
about 177
generating diagonally dominant covariance matrix 181–182
problems with covariance matrices 190
testing covariance matrices 177
positive semidefinite (PSD) 177–179
power function distribution 113
power of regression tests 211–215
power of t test
effect of sample size on 87–88
POWER procedure
evaluating power of t test 84–86
PRINT statement
about 328
COLNAME= option 328
FORMAT= option 328
LABEL= option 328
ROWNAME= option 328
sampling distribution example 69
probability distributions
See continuous distributions
See discrete distributions
probability mass function
See PMF function
PROBBNRM function 155
PROBGAM function 32
PROBIT function 32
PROBNORM function 32
procedures
BY statement in 55
suppressing notes to SAS log 99–100
PROJS function 191
PROJU function 191
proportional hazards model 242–245
PSD (positive semidefinite) 177–179
pseudorandom numbers 33
%PUT statement 35
Q
Q-Q (quantile-quantile) plot 41–44
QQPLOT statement, UNIVARIATE procedure 41–44
QUANTILE (inverse CDF) function
about 326
acceptance-rejection technique and 126–127
computing confidence interval for a mean 77
computing quantile of normal distribution 156
fitting and simulating data from copula model 170–171
generating data from copulas 165
parameter considerations 110
univariate distribution support 112–113
working with statistical distributions 30, 32
quantile-quantile (Q-Q) plot 41–44
quantiles
See also QUANTILE function
about 32
checking correctness of simulated data 35
computing for normal distributions 156
R
RAND function
finite mixture distribution and 120
linear regression model and 227
logistic regression model and 227
overlaying theoretical PMF on frequency plots 38
parameter considerations 110–111
Poisson regression model and 229
simulating data from inhomogeneous Poisson process 275
simulating univariate data 13–14, 18, 23–24, 27
univariate distribution support 111–112, 114
working with statistical distributions 30
RANDDIRICHLET function 137
RANDFLEISHMAN module 312
RANDGEN subroutine
computing confidence interval for a mean 77
distributions supported by 112
J function and 97
overlaying theoretical PMF on frequency plots 38
sampling distribution of the mean 68
simulating data from homogeneous Poisson process 274
simulating univariate data 12, 18, 24–27
two-sample pooled variance t test 83
working with statistical distributions 30
RANDMULTINOMIAL function 89, 130, 327
RANDMVBINARY function 157
RANDMVORDINAL function 159–160
RANDNORMAL function
Cholesky transformation 146
conditional simulations 143
simulating data from multinomial distributions 70, 133, 138, 145
unconditional simulation of one-dimensional data 265
random correlation matrices 187–189
random effects
about 226
creating design matrices for 238–239
generating variables for 201
linear mixed models with 226, 230–232
repeated measures model with 231–232
random number generation
ARMASIM function and 259
Mersenne-Twister algorithm 32–33
RANDSEED subroutine and 259
random values for distributions
See RAND function
random variates 12
RANDSEED subroutine
random number generation and 259
sampling distribution of the mean 68
simulating univariate data 24, 26
RANDVALEMAURELLI function 318–319
RANDWISHART function 137, 186, 327
RANGAM function 32
RANGE= option, SIMULATE statement (SIM2D) 267
rank (Spearman) correlations 169, 176
RANK function 327
RANNOR function 32
rate parameter 22
Rayleigh distribution 114
READ statement
about 329
INTO clause 330
WHERE clause 55
reading data from data sets 329–330
reference parameterization 218
REFLINE statement, SGPLOT procedure 90
REG procedure
MODEL statement 221
OUTPUT statement 205
simple linear regression models and 199–200, 204–205
regression models
about 197
linear 199–211, 215–218, 226–230
linear mixed models 226, 230–242
logistic regression model 226–229
Poisson regression model 226, 229–230
power of regression tests 211–215
survival analysis models 123–125, 242–247
repeated measures model with random effect 231–232
repetitions (number of samples) 95–96
REPS= option, SURVEYSELECT procedure 287
resampling
case 284
with SURVEYSELECT procedure 282, 286–288
reshaping matrices 69
response variables
in logistic regression 226
RETURN statement 327
RMSE (root mean square error)
about 199
linear model based on real data 204
linear model with continuous variable 201
nonparametric models 248
RANDMVT function 327
ROBUSTREG procedure
MODEL statement 222
OUTPUT statement 222
ROBUSTREG routine 140
ROOT function
about 327
checking if matrix is PD 182
checking if matrix is PSD 179
Cholesky transformation and 147
root mean square error (RMSE)
about 199
linear model based on real data 204
linear model with continuous variable 201
nonparametric models 248
row-major order for matrices 260
ROWNAME= option, PRINT statement 328
ROWVEC function 327
RSREG procedure 316
S
SAMPLE function
simulating univariate data 18, 25–26
sample moments
checking correctness of simulated data 35–37
sample size
bias of kurtosis estimates and 65–67
effect of on power of t test 87–88
effect of on sampling distribution 63–65
standard error and 96
sampling distribution
approximating for AR(1) parameters 254–256
bias of kurtosis estimates 65–67
effect of sample size on 63–65
estimating probability with 59–60
evaluating statistical techniques for 73–91
Monte Carlo estimates 54
of statistics for normal data 60–63
simulating data using SAS/IML language 67–71
simulating data with DATA step and procedures 55–67
sampling variation 16
SAMPRATE= option, SURVEYSELECT procedure 287
SAS Grid Manager 102
SAS/IML language
computing confidence interval for a mean 77–78
constructing block-diagonal matrix 232–233
creating grid of values 332–333
DATA step comparison 6
design of simulation studies and 315
Fleishman's method and 312–313
generating symmetric matrices 181
Iman-Conover method 162
matrices and 6
modules for sample moments 336–337
modules replicating functions 333–336
obtaining programs used in book 8
PRINT statement 328
reading data from data sets 329–330
reading design matrices into 239
resampling support 282, 288–291
simulating data from regression models 206–207
simulating multivariate normal data 133–136, 140
subscript reduction operators 328–329
writing data to data sets 330–331
SAS log, suppressing notes to 99–100
SAS Simulation Studio 9
SAS/STAT software 133, 136, 140
SASFILE statement 283
SCALE= option, SIMULATE statement (SIM2D) 267
scatter matrix 186
SCATTER statement, SGPLOT procedure
YERRORLOWER= option 86
YERRORUPPER= option 86
SDF (survival distribution function) 245
SEED= option, SURVEYSELECT procedure 287–288
seed value
for random number generation 33–35
for sampling distribution examples 55
semicolon (;) 6
SETDIF function 327
SGPLOT procedure
annotation facility 301
bias of kurtosis estimates in small samples 66
conditional distributions 144
conditional simulations 271
generating data from copulas 168
INSET statement 90
jitter technique and 132
nonparametric models example 248–249
plotting PDF 31
REFLINE statement 90
SCATTER statement 86
visualizing stationary time series 257–258
SGRENDER procedure
conditional simulations 272
overlaying theoretical density on histograms 40–41
overlaying theoretical PMF on frequency plots 38
SGSCATTER procedure 135, 141, 291
SHAPE function
about 327
generating ID variables 331
generating matrices from Wishart distribution 187
reshaping matrices 69
shape parameters 110, 299, 301
SIM2D procedure
about 12
COORDINATES statement 272
GRID statement 267
NARROW option 267
PLOTS= option 268
producing contour plots 269
SIMULATE statement 267–268, 272
simulating data from Gaussian field 263–264, 267, 272
SIMNORMAL procedure
conditional simulations 142
simulating MVN distributions 136
simple bootstrap
See bootstrap methods
SIMULATE statement
COPULA procedure 170
simulating data
See also under specific techniques
advanced techniques for multivariate data 153–174
advanced techniques for univariate data 109–128
building correlation and covariance matrices 175–194
disadvantages of 105
for advanced regression models 225–249
for basic regression models 197–224
from basic multivariate distributions 129–152
from common univariate distributions 11–28
from time series models 251–261
moment matching and moment-ratio diagram 297–322
preliminary and background information 29–47
resampling and bootstrap methods 281–295
shortening simulation times 103–105
to estimate sampling distributions 51–71
to evaluate statistical techniques 73–91
using DATA step and procedures 55–67
simulation loop 55
Simulation Studio 9
singular parameterization 216
skewness
checking correctness of simulated data 36
design of simulation studies and 315–316
Fleishman distribution and 115, 311
for gamma distribution 306–308
Johnson system of distributions 310–311
moment matching and 303
plotting variations on moment-ratio diagram 303–306
sampling distribution example 65–67
SKEWNESS module 288
Sklar's theorem 169
smooth bootstrap method 281, 292–294
SMOOTH= option, MODEL statement (LOESS) 249
SMOOTHBOOTSTRAP module 294
SOLVE function 327
SORT call 327
SORT procedure 56
spatial models
about 263
simulating data from a regular process 276–278
simulating data from Gaussian random field 263–273
simulating data from homogeneous Poisson process 274–275
simulating data from inhomogeneous Poisson process 275–276
simulating data from spatial point process 273–274
simulating data using other techniques 278–279
spatial point processes 273–274
Spearman (rank) correlations 169, 176
spectral decomposition 150–151
spectrum (eigenvalues) 187–189
SQRT function 326
SQRVECH function
generating symmetric matrices 181
multivariate normal distributions and 140–141
standard errors
about 53
bootstrap 283, 294–295, 305–306
Monte Carlo 96
plotting bootstrap estimates of 305–306
sample size and 96
standard normal distribution
computing p-values 32
computing quantiles for 156
parameters for 28
simulating data from 12–13, 21–22
standardized uniform distribution 22
START statement 327
STAT= option, BARCHART statement (TEMPLATE) 38–39
STATESPACE procedure 261
statistic
sampling distribution of 51–53
standard error of 53
statistical distributions
checking correctness of simulated data 35–44
essential functions for working with 30–33
STORE statement 328
STREAMINIT function
linear regression example 227
macro-loop technique and 101
simulating univariate data 13–14
Student's t distribution 137
subpopulations (components) 119–121
subscript reduction operators
assessing t test 84
sampling distribution of the mean 68
writing efficient simulations 97
sum of squares (##) operator 329
SURVEYSELECT procedure
about 282
METHOD= option 287
REPS= option 287
SAMPRATE= option 287
survival analysis models
about 125
proportional hazards model 242–245
simulating data from multiple survivor functions 245–247
survival distribution function (SDF) 245
SYMCHECK function 178
symmetric matrices 181
SYMPUTX subroutine 90
%SYSEVALF macro 172
SYSRANDOM macro variable 34–35
system time, setting seed value from 34–35
T
T function 328
t test
assessing for equality of means 78–84
effect of sample size on 87–88
tables
creating data sets from 45–46, 57
excluding 45
selecting 45
TABLES statement, FREQ procedure
CHISQ option 90
OUT= option 56
TABULATE call 328
tabulated distributions
finite mixture distribution and 120
multinomial distribution and 19, 130
parameters for 27
sampling from finite sets and 26
TEMPLATE procedure
overlaying theoretical density on histograms 40
templates for simulating data
defining contour plots 268
macro-loop technique and 100–101
overlaying theoretical densities on histograms 40–41
overlaying theoretical PMF on frequency plots 37–38
univariate distributions 13–14
with DATA step and procedures 55–57
_TEMPORARY_ keyword 19
TEST statement, REG procedure 211–213
testing for covariance matrices 177–179
thinning algorithms 275, 278–279
time series models
about 251
simulating data from ARMA models 252–261
using arrays to hold explanatory variables 201
visualizing stationary time series with SGPLOT 257–258
Toeplitz matrix 185
TPSPLINE procedure 247
transformation technique 146
TRANSPOSE procedure 66
triangle distribution 28
truncated distribution 121, 126
TTEST procedure
simulated power analysis 85
two-sample pooled variance t test 80, 83, 85
two-dimensional data
conditional simulation of 272–273
unconditional simulation of 267–269
two-sample t test
about 78
assessing in SAS/IML software 83–84
effect of sample size on power of 87–88
robustness to nonnormal populations 81–82
robustness to unequal variances 78–81
Type I extreme value distribution 111–112
U
UCLM= option, OUTPUT statement (MEANS) 74–75
unconditional simulations
about 264
of one-dimensional data 264–267
of two-dimensional data 267–269
UNCONDSIMGRF function 269
unequal variances, robustness of t test to 78–81
uniform correlation structure 184
uniform distribution
continuous 22
linear regression model example 210–211
UNION function 328
UNIQUE function 328
univariate distributions
acceptance-rejection technique 126–128
adding location and scale parameters 109–111
finite mixture distributions 119–122
resampling with SAS/IML software 288–289
SAS software support for 27–28
simulating data from continuous distributions 20–24, 28
simulating data from discrete distributions 14–20, 27
simulating data from standard normal distribution 12–13
simulating data in DATA step 11–14
simulating data in SAS/IML software 24–27
simulating data in time series 251
simulating data with given moments 297
simulating from less common 111–116
simulating survival data 123–125
UNIVARIATE procedure
approximating sampling distribution 55
bootstrap resampling 285
checking correctness of simulated data 35–37
distributions supported by 111–112, 114–116
fitting gamma distribution to data 306
HISTOGRAM statement 40, 62, 255, 306
INSET statement 306
inverse transformation algorithm 117
Johnson system of distributions and 309–310
parametric bootstrap method 291
sampling distribution of Pearson correlations 70
sampling distribution of the mean 57–59
sampling distribution of the variance 62
VARDEF= option 295
USE statement 329
V
Vale-Maurelli algorithm 318–321
VAR clause, CREATE statement 331
VAR= option, IDENTIFY statement (ARIMA) 253
VARDEF= option
MEANS procedure 295
UNIVARIATE procedure 295
variance components model 183–184
variance function 226
variance reduction techniques 96
variances
robustness of t test to unequal 78–81
sampling distribution of 62–63
VARIOGRAM procedure 264, 268, 272
VARMASIM subroutine 251
VECDIAG function 328
VECH function 181
vectors
creating data sets from 330–331
creating grid of values 332–333
efficiency of 6
reading data into 329
tips for shortening simulation times 103
W
Weibull distribution
parameters for 28
proportional hazards model and 242
Rayleigh distribution and 114
WHERE clause
ESTIMATE statement, ARIMA procedure 254
READ statement 55
Wishart distribution 176, 186–187
writing data to data sets 330–331
writing efficient simulations
basic structure of efficient simulations 96–97
disadvantages of simulations 105
macro usage considerations 101–102
profiling SAS/IML simulation 102–103
shorting simulation times 103–105
suppressing notes to SAS log 99–100
suppressing ODS output and graphics 97–99
X
XSECT function 328
Y
YERRORLOWER= option, SCATTER statement (SGPLOT) 86
YERRORUPPER= option, SCATTER statement (SGPLOT) 86
Symbols
+ (addition) operator 329
@ (at) symbol 233
<= (comparison) operator 6
= (equality) operator 6
<:> (index of maximum) operator 329
>:< (index of minimum) operator 329
^= (inequality) operator 6
<> (maximum) operator 329
>< (minimum) operator 329
# (multiplication) operator 215, 329
; (semicolon) 6
## (sum of squares) operator 329