Index
A
accuracy
based on misclassification rate 6
sensitivity and specificity 7–8
area under ROC curves (AUC)
binormal curves 27–28
binormal curves, comparing 42–46, 65–66
bootstrap confidence intervals for 33–35
bootstrap-validated estimates of ROC curves 107
comparison confidence intervals 48
computing with FREQ procedure 23–24
computing with LOGISTIC procedure 22–23
computing with %ROC macro 24–25
cross-validation estimates of ROC curves 103–104
empirical curves 21–25
empirical curves, comparing 39–42, 64–65
empirical curves, ordinal markers 54–55
Lehmann family of ROC curves 70
ordinal markers 63–66
ROC curves with censored data 86–88
split-sample estimates of ROC curves 100–102
asymptotic standard error (ASE) 12
AUC
See area under ROC curves
AXIS statement 55
B
binary predictor 5–14
for reporting on continuous predictor 25–26
frost forecast (example) 5–6
reasons for emphasizing 95
BINOMIAL keyword (FREQ procedure) 10, 12
binormal parameters 27
binormal ROC curves 26–30, 96
area under 27–28
compared to empirical curves 29, 46–47
comparing areas of 42–46
direct estimation of 32–33
latent 57–63
latent, comparing areas under 65–66
normality 30
regression models for 49–51, 59–60, 69
standardized uptake value (SUV) 26–27
transformations to binormality 30–32
binormal ROC curves, area under
See area under ROC curves (AUC)
%BOOT1AUC macro 34, 56
bootstrap samples for confidence intervals 96
for areas under curves (AUCs) 33–35
for difference in AUCs 48
bootstrap-validated estimates of ROC curves 96, 106–107
Box-Cox power transformations 30–31, 43
%BVAL macro 107
C
c-index 90–91, 93
cancer examples 38, 81–82
censored data, ROC curves with 81–93
area under curves 86–88
concordance probability 88–91
monotone functions and censored data 86
%CINDEX macro 90
CLASS statement, UNIVARIATE procedure 16
“close to the ideal” 26
clustered data 75–79
comparing ROC curves 38
for ordinal predictors 63–66
Lehmann family of ROC curves 79
paired vs. unpaired data 38–39
competing models, ROC curves for 113–116
concordance probability 22
See also area under ROC curves (AUC)
ROC curves with censored data 88–91
with Cox regression models 91–93
conditioning 10
confidence intervals for sensitivity and specificity 10–12
confidence intervals from bootstrap samples 96
for areas under curves (AUCs) 33–35
for difference in AUCs 48
continuous predictor 15–35, 97–98
dichotomizing 15–18
Lehmann family of ROC curves 67–80
operating point 25–26
optimal threshold 25–26
reasons for emphasizing 95
reporting on, with binary predictor 25–26
ROC curve for 18–20
threshold 25–26
CONTRAST statement, NLMIXED procedure 44–46
CONTRAST variable, %ROC macro 39–40
covariate adjustment for ROC curves 48–49
Lehmann family 73–75
COVSANDWITCH option, PHREG procedure 76–77
Cox regression models 69, 91–93
%CPE macro 93
credit rating example 54
credit scoring 2
cross-validation estimates of ROC curves 96, 102–106
area under curves 103–104
k-fold cross-validation 96, 104–106
LOGISTIC procedure for 102–104
resubstitution estimates vs. 104
D
data clustering 75–79
decision trees 113
dichotomizing continuous predictors 15–18
discordant pairs
See concordance probability
double dipping 96
E
empirical ROC curves 20–26, 63–66
compared to binormal curves 29, 46–47
comparing areas of 39–42
for ordinal predictors 54–57, 64–65
empirical ROC curves, area under
See area under ROC curves (AUC)
ESTIMATE statement, NLMIXED procedure 28, 44–46, 62
estimating equations for clustered data 75–79
Euclidean method 25
exported data, GPLOT procedure with 116–118
F
false negatives (FN) 6, 7–8
calculating with FREQ procedure 9
false positives (FP) 6, 7–8
calculating with FREQ procedure 9
“far from random” 26
FREQ procedure 8–14
BINOMIAL keyword 10, 12
calculating false negatives and positives 9
calculating misclassification rate 9
calculating negative and positive predictive values 9
calculating sensitivity 9–12
calculating specificity 9–10, 13
computing Somer's D and AUC 23–24
ORDER= option 14
TABLE statement 23
FRONTREF option, HISTOGRAM statement (UNIVARIATE) 17
frost forecast (example) 5–6
G
generalized estimating equations 76
GPLOT procedure 116–118
H
hazard function 68
heteroscedastic models 51
HISTOGRAM statement, UNIVARIATE procedure 16
FRONTREF option 17
HREF= option 17
home equity loan example 110
homoscedastic models 51
HREF= option, HISTOGRAM statement 17
I
INMODEL= option, LOGISTIC procedure 100
intercept 27
invariance to monotone transformations 21, 22
K
k-fold cross-validation 96, 104–106
Kaplan-Meier estimation method 83
L
latent binormal ROC curves 57–63
comparing areas under 65–66
leave-one-out validation 96, 102–104
leaves (SAS Enterprise Miner) 114
Lehmann family of ROC curves 67–80
adjusting for covariates 73–75
area under curves 70
clustered data 75–79
comparing curves 79
LIFETEST procedure 83–85
TIME statement 83
liver surgery example 96–97
location-scale families 59
LOGISTIC procedure 8, 19
computing AUC 22–23
cross-validation estimates of ROC curves 102–104
INMODEL= option 100
OUTMODEL= option 100
resubstitution estimates of ROC curves 96
SCORE statement 100, 101
split-sample estimates of ROC curves 100–102
lung cancer example 81–82
M
magnetic resonance example 70–73
MEASURES option, TABLE statement (FREQ) 23
medical diagnostics, prediction in 2
METHOD= option, SURVEYSELECT procedure 34
misclassification rate (MR) 6–7
calculating with FREQ procedure 9
in ROC curve 19
MIXED procedure 44
regression model for binormal ROC curves 49–50
MODEL statement
OUTROC= option 19, 71, 101, 104
RL option 72
SLE option 97
SLS option 97
monotone functions 21
censored data and 86
monotone transformations, invariance to 21, 22
MR
See misclassification rate multiple observations from individual unit 75–79
multivariable prediction models 95–107
bootstrap-validated estimates of ROC curves 96, 106–107
cross-validation estimates of ROC curves 96, 102–106
resubstitution estimates of ROC curves 96, 97–99, 102, 104
split-sample estimates of ROC curves 96, 99–102
N
negative predictive value (NPV) 8
calculating with FREQ procedure 9
neural networks 114–115
NLMIXED procedure 27–28
CONTRAST statement 44–46
ESTIMATE statement 28, 44–46, 62
estimating binormal ROC curves 32–33
ordinal-probit regression model 60–62
RANDOM statement 44
regression model for binormal ROC curves 49–50
nodes (SAS Enterprise Miner) 111, 114
NOPRINT option, TABLE statement (FREQ) 23
normality, in binormal models 30
NPV (negative predictive value) 8
calculating with FREQ procedure 9
O
observations, multiple from individual unit 75–79
observed operating points 19
OFFSET= option, AXIS statement 55
operating point
continuous predictor 25–26
observed 19
OPTIMAL option, %PLOTROC macro 26
optimal threshold, continuous predictor 25–26
optimism-adjusted estimates 106
ORDER= option, FREQ procedure 14
ordinal predictors 53–66
comparing ROC curves 63–66
empirical ROC curves for 54–57, 64–65
latent binormal ROC curves 57–63
ordinal-probit regression model 60–62
OUT= option, SURVEYSELECT procedure 34
OUTHITS= option, SURVEYSELECT procedure 34–35
OUTMODEL= option, LOGISTIC procedure 100
OUTPUT statement, PREDPROBS=X option 103, 104
OUTROC= option
MODEL statement 19, 71, 101, 104
SCORE statement (LOGISTIC) 101
P
paired data 38–39
comparing binormal ROC curves 42–46
comparing empirical ROC curves 39–41
PARAMETER= option, BOXCOX transformation 31
partitioning, recursive 113
percent correct statistic 6
PET scanning (example) 16–17
PHREG procedure 69–70, 72–73, 74
clustered data 75–79
COVSANDWITCH option 76–77
%PLOTROC macro 41, 100–101, 105, 116
OPTIMAL option 26
positive predictive value (PPV) 8
calculating with FREQ procedure 9
prediction
business of 1
in medical diagnostics 2
prediction models, multivariable
See multivariable prediction models
predictors
See also binary predictor
See also continuous predictor
dichotomizing continuous 15–18
ordinal 53–66
PREDPROBS=X option, OUTPUT statement 103, 104
prevalence 7
product limit estimation method 83
proportional hazards regression 69, 79
concordance probability with 91–93
prostate cancer prognosis (example) 38
R
RANDOM statement, NLMIXED procedure 44
randomly breaking the ties 22
receiver operating characteristic curves
See ROC curves
recursive partitioning 113
REG procedure 32–33
regression
Cox regression models 69, 91–93
model for binormal ROC curves 49–51, 69
ordinal model for binormal ROC curves 59–60
ordinal-probit model 60–62
proportional hazards regression 69, 79, 91–93
resubstitution estimates of ROC curves 96, 97–99, 102
cross-validation estimates vs. 104
LOGISTIC procedure for 96
split-sample estimates vs. 102
RL option, MODEL statement 72
ROC curves 1–3
See also binormal ROC curves
See also empirical ROC curves
bootstrap-validated estimates of 96, 106–107
covariate adjustment for 48–49, 73–75
cross-validation estimates of 96, 102–106
for competing models 113–116
for single continuous predictor 18–20
for single model 111–113
in SAS Enterprise Miner 109–118
Lehmann family of 67–80
misclassification rate in 19
multivariable prediction models 95–107
nomenclature for 2
resubstitution estimates of 96, 97–99, 102, 104
semi-parametric 68
smoothing 26
split-sample estimates of 96, 99–102
with censored data 81–93
ROC curves, area under
See area under ROC curves (AUC)
ROC curves, comparing 38
for ordinal predictors 63–66
Lehmann family of ROC curves 79
paired vs. unpaired data 38–39
%ROC macro 24–25, 103
comparing empirical curves 39–42, 64–65
CONTRAST variable 39–40
empirical curves with ordinal markers 56–57
ROC points 54–55
See also ordinal predictors
%ROCPLOT macro 20
S
SAS Enterprise Miner 109–118
GPLOT procedure with exported data 116–118
leaves 114
nodes 111, 114
ROC curves for competing models 113–116
ROC curves for single model 111–113
terminal nodes 114
SCORE statement, LOGISTIC procedure 100
OUTROC= option 101
self-prediction 96
semi-parametric ROC curves 68
sensitivity 7–8
calculating with FREQ procedure 9–12
confidence intervals for 10–12
plotting against specificity 18–19
ROC curves with censored data 82–83
single binary predictor 5–14
for reporting on continuous predictor 25–26
frost forecast (example) 5–6
reasons for emphasizing 95
single continuous predictor 15–35, 97–98
dichotomizing 15–18
Lehmann family of ROC curves 67–80
reasons for emphasizing 95
reporting on, with binary predictor 25–26
ROC curve for 18–20
single model, ROC curves for 111–113
SLE option, MODEL statement 97
slope 27
SLS option, MODEL statement 97
smoothing ROC curves 26
Somer's D statistic 23–24
specificity 7–8
calculating with FREQ procedure 9–10, 13
confidence intervals for 10–12
plotting against sensitivity 18–19
ROC curves with censored data 82–83
split-sample estimates of ROC curves 96, 99–102
area under curves 100–102
LOGISTIC procedure for 100–102
resubstitution estimates vs. 102
stratified splits 99
stepwise selection 97–98
stratified splits 99
SURVEYSELECT procedure 34–35
METHOD= option 34
OUT= option 34
OUTHITS= option 34–35
survival models 82–83
SUV (standardized uptake value) 16–17
binormal curve 26–27
dichotomizing 17
histograms 16, 17
T
TABLE statement, FREQ procedure 23
MEASURES option 23
NOPRINT option 23
%TDROC macro 86–88
terminal nodes (SAS Enterprise Miner) 114
threshold, continuous predictor 25–26
TIME statement, LIFETEST procedure 83
TNR (true negative rate)
See specificity
TPR (true positive rate)
See sensitivity
TRANSREG procedure 30–32, 43
trapezoidal rule 21–22
true negative rate (TNR)
See specificity
true negatives (TN) 6, 9
true positive rate (TPR)
See sensitivity
true positives (TP) 6, 9
two-sample t tests 38
U
U-statistics 39
UNIVARIATE procedure for SUV histograms 16
unpaired data 38–39
comparing binormal ROC curves 46
comparing empirical ROC curves 41–42
V
validation 96
bootstrap-validated estimates of ROC curves 96, 106–107
cross-validation estimates of ROC curves 96, 102–106
leave-one-out validation 96, 102–104
resubstitution estimates of ROC curves 96, 97–99, 102, 104
split-sample estimates of ROC curves 96, 99–102
W
Wald test 12, 79
weather forecasting examples 1, 5–6
X
%XVAL macro 104–105