

absolute error measures 226, 227, 244

absolute penalty fit method 211

accuracy characteristic of prediction model 231

activation function, neural network 203–204

Adjusted R2 75, 76

affinity grouping 252

agglomerative algorithm, clustering 154–163

AI (artificial intelligence) 250

AIC/AICc (Akaike information criterion) approach 76, 184, 195–196

Analyze command

Fit Line 65

Fit Model 67–69, 95

Fit Stepwise 74

Fit Y by X 31–33, 65, 83–84

ANOVA (analysis of variance)

one-way/one-factor 83–96

process 82–83

two-way/two-factor 97–102

area under the curve (AUC) 235, 236, 237

artificial intelligence (AI) 250

association rules 252

association task, predictive analytics 254

AUC (area under the curve) 235, 236, 237

average linkage method, distance between clusters 154–155

Axis Titles, Histogram Data Analysis 23


BA (business analytics) 3–5, 250

bar chart 59–61

Bayesian information criterion (BIC) 76

bell-shaped distribution 18

BI (business intelligence) 3–5

bias term, neural network 203

BII (business information intelligence) 3–5

binary dependent variable 104, 221–222, 230–237

binary vs. multiway splits, decision tree 181

bivariate analysis 6, 31–36, 124–133

BMI (business modeling intelligence) 3–5

boosting option, neural network predictability 210, 216–217

BSI (business statistical intelligence) 3–5

bubble plot 53–55

business analytics (BA) 3–5, 250

business information intelligence (BII) 3–5

business intelligence (BI) 3–5

business modeling intelligence (BMI) 3–5

business statistics intelligence (BSI) 3–5


categorical variables

See also ANOVA

deciding on statistical technique 26, 28–29

decision tree 180, 181–192

graphs 45–46, 50–51, 52

neural network 208, 212–213

regression 76–82

tables 42

causality 39

central limit theorem (CLT) 18–24

centroid method, distance between clusters 154–155, 164–166

chaining in single linkage method, distance between clusters 154

chi-square test of independence 185

churn analysis 122–133

classification task, predictive analytics 254

classification tree 181–192, 237, 239–240

cleaning data for practical study 8

CLT (central limit theorem) 18–24

cluster analysis

credit card user example 152–153

definition 152

hierarchical clustering 154–163, 177

k-means clustering 154, 164–177

regression, using clusters in 164

Cluster command 156–157, 159

Clustering History, Cluster command 159

clustering task, predictive analytics 254

coefficient of determination (RSquare or R2) 66

Color Clusters, Cluster command 157

Column Contributions, decision tree 198

complete linkage method, distance between clusters 154–155

confusion matrix

binary dependent variable model comparison 230–231

bivariate analysis contingency table 35

confusion matrix (continued)

logistic regression 120, 132

neural network 222–223

Connect Thru Missing, Overlay Plot command

constant variance assumption 105

contingency table 35

See also confusion matrix

continuous variables

See also ANOVA

deciding on statistical technique 26, 28

decision tree 180, 183, 192–199

logistic regression 129–130, 132

model comparison 226–230, 244

neural network 208

regression 76–77

contour graphs 55–56

cornerstone of statistics, CLT theorem 20

correlation coefficient 227

correlation matrix

logistic regression 125

multiple regression 67, 68

PCA 136–138, 142, 148–149

criterion function, decision tree 184–185

CRM (customer relation management) 250

cross-validation, neural network 207–208

customer relation management (CRM) 250


data, role of 2–3

data discovery 9, 11

See also graphs

See also tables

Data Filter, Graph Builder 56–61

data mining 4, 251, 254–255

See also predictive analytics

data warehouse 2

decision trees

classification tree 182–192, 237, 239–240

credit risk example 180–182

definition 180

pros and cons of using 124

regression tree 192–199

dendrogram 154, 157, 158, 159–160, 164

dependence, multivariate analysis framework 11

See also specific techniques

differences, testing for

one-way ANOVA 90–96

dimension reduction, PCA 136, 142–144

directed (supervised) predictive analytics techniques 252, 253, 254

“dirty data,” problem of 6

discovery, multivariate analysis framework 9, 11

See also graphs

See also tables

discovery task, predictive analytics 254

Distribution command 30, 85–87, 125, 175

drop zones, Graph Builder 46–48

dummy variables 76–77, 79–82, 212

Dunnett’s test 95

Durbin-Watson test 73

dynamic histogram, Excel 23

dynamic linking feature, JMP 58


Effect Likelihood Ratio Tests 129

eigenvalue analysis 141–144, 145, 147

eigenvalues-greater-than-1 method, PCA 144

elbow discovery method 144, 167

enterprise resource planning (ERP) 2

equal replication design, ANOVA with 97–102

error table 230

See also confusion matrix

estimation task, predictive analytics 254

Excel, Microsoft

measuring continuous variables 228–229

opening files in JMP 28

PivotTable 40–42

random sample generation 20–24

reasons for using 10–11

Exclude/Unexclude option, data table 147


factor analysis vs. PCA 140–141

See also Principal Component Analysis

factor loadings 145

false positive rate (FPR), prediction model 231–232

features, neural network 204–205

filtering data 56–61, 236–237

Fit Line, Analyze command 65

Fit Model, Analyze command 67–69, 95

Fit Stepwise, Analyze command 74

Fit Y by X, Analyze command 31–33, 65, 83–84

fitting to the model

ANOVA 83–84, 95

clusters 164

G2 (goodness-of-fit) statistic, decision tree 184, 185–190

neural networks 206, 211, 215, 220

regression 65, 67–69, 71, 74, 122, 128

statistics review 31–33

train-validate-test paradigm for 240–246

Formula command 77–79

FPR (false positive rate), prediction model 231–232

fraud detection 250

frequency distribution, Excel Data Analysis Tool

F-test 65, 71–72, 83


G2 (goodness-of-fit) statistic, decision tree 184,

Gaussian radial basis function 204

gradient boosting, neural network 210

Graph Builder 45–61


bar chart 59–61

bubble plot 53–55

contours 55–56

Graph Builder dialog box 45–48

line graphs 55–56

scatterplot matrix 48–51, 123, 126

trellis chart 51–53, 55–56, 58

Group X drop zone 46–47

Group Y drop zone 46–47


hidden layer, neural network 205, 208–210

hierarchical clustering 154–163, 177

high-variance procedure, decision tree as 198–199

Histogram, Excel Data Analysis Tool 21–22

holdback validation, neural network 206, 215

homocedasticity assumption 105

Hsu’s MCB (multiple comparison with best) 94–95

hyperbolic tangent (tanh) 204

hypothesis testing 24–26


“include all variables” approach, logistic regression 123, 124

indicator variables 76–77, 79–82, 212

input layer, neural network 202–203

in-sample and out-of-sample data sets, measures to compare 82, 228–229, 244

interactions terms, introducing 130–132

interdependence, multivariate analysis framework 11

See also cluster analysis

See also Principal Component Analysis



See SAS JMP statistical software application

Johnson Sb transformation 211

Johnson Su transformation 211


k-fold cross-validation, neural network 207–208

k-means clustering 154, 164–177


Lack of Fit test, logistic regression 122, 128

Leaf Report, decision tree 198

learning rate for algorithm 210

least squares criterion 206, 211

least squares differences (LSD) 94

Levene test, ANOVA 89, 90, 102

lift chart 237–240

line graphs 55–56

Linear Probability Model (LPM) 105

linear regression

See also logistic regression

definition 65

LPM 105

multiple 67–76

simple 64–66

sum of squared residuals 128

linearity of logit, checking 132

loading plots 139, 145–146, 148–149

log odds of 0/1 convention 113

logistic function 106–112

logistic regression

bivariate method 124

decision tree method 124

lift curve 237–240

logistic function 106–112

LPM 105

odds ratios 109–111, 113–122

ROC curve 235–237

statistical study example 122–133

stepwise method 124

logit transformation 107

LogWorth statistic, decision tree 185, 186, 187–190

low- vs. high-variance procedures 198–199

LPM (Linear Probability Model) 105

LSD (least squares differences) 94

LSMeans Plot command 95, 97


Make into Data Table, ROC curve 236

Mark Clusters, Cluster command 157

market basket analysis 252

mean absolute error (MAE) measure 226, 244

mean square error (MSE) measure 226, 244

means comparison tests, ANOVA 90–95

Means/ANOVA command 88–89

model comparison

binary dependent variable 230–237

continuous dependent variable 226–230, 244

introduction 225

lift chart 237–240

training-validation-test paradigm 240–246

Model Launch command, neural network 216

Mosaic plot 34–35

Move Up, Value Ordering 116–117

MSE (mean square error) measure 226, 244

multicollinearity of independent variables 73–74

multiple regression 67–76

Multivariate command 67, 142

multivariate data analysis

and data sets 37–39

as prerequisite to predictive modeling 249–250

commonality for practical statistical study 7

framework 9, 11

multiway splits in decision tree 181, 185


neural networks

basic process 202–206

data preparation 212–213

fitting options for the model 206, 211, 215, 220

hidden layer structure 205, 208–210

prediction example 213–223

purpose and application 201

validation methods 206–208, 215–216

New Columns command 100

no penalty fit option 211

nominal data 26

nonlinear transformation 74, 204

normal (bell-shaped) distribution 18

Normal Quantile Plot, Distribution command 85–87, 125

Number of Models, neural network 216

Number of Tours, neural network model 216, 217


odds ratios, logistic regression 109–111, 113–122

Odds Ratios command 116, 118

one-sample hypothesis testing 24–25

one-way/one-factor ANOVA 83–96

online analytical processing (OLAP) 40–45

optimal classification, ROC curves 233–235, 236

ordinal data 26

outliers, scrubbing data of 212, 219

out-of-sample and in-sample data sets, measures to compare 82, 228–229, 244

output layer, neural network 202–203

overfitting the model/data

clusters 164

decision trees 191

neural network 206–211, 216, 218

train-validation-test paradigm to avoid 240–246

Overlap drop zone 46–47

Overlay Plot command 166–167


Pairwise Correlations, Multivariate command 142

parallel coordinate plots, k-means clustering 172–173

Parameter Estimates, Odds Ratios command 118

parsimony, principle of 74, 123

partition initial output, decision tree 183–184, 193


See Principal Component Analysis

penalty fit method 211, 215, 220

PivotTable, Excel 40–42

Plot Residual by Predicted 72–73, 218–219

PPAR (plan, perform, analyze, reflect) cycle 9–11

practical statistical study 7, 8–9

prediction task, predictive analytics 254

predictive analytics

availability of courses 7

definition 4, 252

framework 252–253

goal 253–254

model development and evaluation phase

multivariate data analysis role in 249–250

phases 254–256

specific applications 5

tasks of discovery 254

vs. statistics 254–255

predictive modeling

See predictive analytics

Principal Component Analysis (PCA)

dimension reduction 136, 142–144

eigenvalue analysis of weights 141–142

example 135–140

structure of data, insights into 145–149

vs. factor analysis 140–141


estimating for logistic regression 119–120

relationship to odds 112

probability formula, saving 119

proportion of variation method, PCA 144, 148

pruning variables in decision tree 191, 195–196

p-values, hypothesis testing 25–26


random sample 14, 20–24

Range Odds Ratios, Odds Ratios command 116

Receiver Operating Characteristic (ROC) curve
191–192, 232–237


See also logistic regression

categorical variables 76–82

clusters 164

continuous variables 76–77

fitting to the model 65, 67–69, 71, 74, 122, 128

linear 64–76, 105, 128

multiple 67–76

purposes 64

simple 64–66

stepwise 74–75, 124, 241–243

regression tree 192–199

relative absolute error 227

relative squared error 226

Remove Fit, neural network 215

repeated measures ANOVA 82

representative sample 14


ANOVA 85, 87

linear regression 128

multiple regression 72–73

neural network 218–219

return on investment (ROI) from data collection 2–3

robust fit method 211

ROC (Receiver Operating Characteristic) curve
191–192, 232–237

root mean square error (RMSE/se) measure 75, 76, 140, 192, 226

RSquare or R2 (coefficient of determination) 66



in-sample and out-of-sample data sets 82,
228–229, 244

one-sample hypothesis testing 24–25

principles 14–15, 18–20

random sample generation 20–24

SAS JMP statistical software application

See also specific screen options and commands

as used in book 10, 11

deciding on best statistical technique 28–36

features to support predictive analytics 58, 254

opening files in Excel 28

saturated model, logistic regression 122

scales for standardizing data, neural network 212

scatterplot matrix 48–51, 123, 126

score plot 139, 145

scree plot

hierarchical clustering 160

PCA 142–143, 145, 146, 147

se (RMSE) 75, 76, 140, 192, 226

Selection button, copying output 44

SEMMA approach 256

sensitivity component of prediction model 231

Show Split Count, Display Options 188

Show Split Prob, Display Options 185

simple regression 64–66

single linkage method, distance between clusters

sorting data

Graph Builder 59–60

PCA 142, 145

specificity component of prediction model 231

Split command, decision tree variables 185–186

squared penalty fit method 211, 220

squaring distances, k-means clustering 173–174

SSBG (sum of squares between groups) 82, 83

SSE (sum of squares between groups [or error]) 82, 83, 166–167, 175

standard error 19

standardized beta coefficient (Std Beta) 69, 71

statistical assumptions, testing for

one-way ANOVA 85–89

statistics coursework

central limit theorem 18–24

coverage and real-world limitations 5–7

effective vs. ineffective approaches 26–36

one-sample hypothesis testing and p-values

sampling principles 14–15, 18–20

statistics as inexact science 14, 15–16

Z score/value 17, 24–25

statistics vs. predictive analytics 254–255

Std Beta, Fit Model command 69, 71

stepwise regression 74–75, 124, 241–243

Subset option, Table in Graph Builder 58–59

sum of squares between groups (or error) (SSE) 82, 83, 166–167, 175

sum of squares between groups (SSBG) 82, 83

Summary Statistics, Distribution command 175

supervised (directed) predictive analytics techniques 252, 253, 254


tables 40–45

Tabulate command 42–45

testing for differences, one-way ANOVA 90–96

testing statistical assumptions, one-way ANOVA

Tests that the Variances are Equal report 85

time series, Durbin-Watson test 73

total sum of squares (TSS) 82, 83

train-validate-test paradigm for model evaluation 240–246

Transform Covariates, neural network 212

trellis chart 51–53, 55–58

true positive rate (TPR) component of prediction model 232

TSS (total sum of squares) 82, 83

t-test 65, 71–72, 93

Tukey HSD test 93, 95

Tukey-Kramer HSD test 93, 95

2R (representative and random) sample 14, 16

two-way/two-factor ANOVA 97–102


unequal replication design, ANOVA with 97

Unequal Variances test, ANOVA 85, 86, 89

Unit Odds Ratios, Odds Ratios command 116

univariate analysis 6

unsupervised (undirected) predictive analytics techniques 252, 253, 254



logistic regression 132

neural network 206–208, 215–216

train-validate-test paradigm 240–246

Validation variable 208

Value Ordering, Column properties 116–117


See also categorical variables

See also continuous variables

automatic assignments for neural network 214

binary dependent variable 104, 221–222,

decision tree 182–191, 194–196

dummy 76–77, 79–82, 212

model building 123–124

multicollinearity 73–74

neural network 208

reclassifying 113, 123

weighting 141–142, 211, 215

variance inflation factor (VIF) 73, 74


Ward’s method, distance between clusters 154

weak classifier, boosting option 210

weight decay penalty fit method 211

weighting of variables 141–142, 211, 215

Welch’s Test 85, 86, 89, 90

Whole Model Test 121–122, 127–128

within-sample variability 82

without replication design, ANOVA 97

Wrap drop zone 46–47


Z score/value 17, 24–25

