Index
Note: Page numbers followed by “f” and “t” refer to figures and tables, respectively.
0-9, and Symbols
missing values and numeric data,
94–96
11-point average recall,
191
A
Accuracy, of association rules,
79,
79,
120
Accuracy, of classification rules,
102,
115
AdaBoost.M1 algorithm,
487
Additive logistic regression,
492–493
Agglomerative clustering,
142,
147
Akaike Information Criterion (AIC),
346
illustrated examples,
350f
Alternating decision trees,
495
Analysis of variance (ANOVA),
393
Antecedent, of rule,
75,
75
incorporating domain knowledge,
512–515
Area under the curve (AUC),
191–192
Area under the precision-recall curve (AUPRC),
192
attribute specifications in,
58
Assignment of key phrases,
516
accuracy (confidence),
79,
120
computation requirement,
127
converting item sets to,
122
coverage (support),
79,
120
predicting multiple consequences,
79
relationships between,
80
Attribute evaluation methods,
562
attribute subset evaluators,
564
single-attribute evaluators,
564
instance-based learning methods,
291
recursive feature elimination,
290–291
searching the attribute space and,
292–293
selective Naïve Bayes,
295
Weka evaluation methods for,
562
Attribute subset evaluators,
564
Attribute-efficient learners,
135
semantic relation between,
513
Authorship ascription,
516
combining reconstructive and discriminative learning,
449
denoising autoencoders,
448
pretraining deep autoencoders with RBMs,
448
Automation applications,
28
Averaged one-dependence estimator (AODE),
348–349
B
Background knowledge,
508
bias-variance decomposition,
482–483
idealized procedure versus,
483
instability neutralization,
482–483
for numeric prediction,
483
positive probability,
476
Balanced iterative reducing and clustering using hierarchies (BIRCH),
160
in finding nearest neighbors,
140
two cluster centers,
145f
Bayes Information Criterion,
159–160
Bayesian estimation and prediction,
367–370
probabilistic inference methods,
368–370
Bayesian Latent Dirichlet allocation (LDA
b),
379–380
data structures for fast learning,
349–352
prior distribution over network structures,
346–347
structure learning by conditional independence tests,
349
multilayer perceptron,
263
overfitting-avoidance,
35
Bias-variance decomposition,
482–483
Binary classification problems,
69
Block Gibbs sampling,
454
Boolean attributes,
55–56
in computational learning theory,
489
forward stagewise additive modeling,
491
“Business understanding” phase,
28–29
C
MDL-based adjustment,
220
Calibration, class probability,
330
discretization-based,
331
Categorical and continuous variables,
452–453
CfsSubsetEval method,
334
Chain-structured conditional random fields,
410
CitationKNN algorithm,
478
Class boundaries
Class labels
Class probability estimation,
321
dataset with two classes,
329,
329f
membership functions for,
129
Classical machine learning techniques,
418
Classification learning,
44
criteria for choosing tests,
221–222
disjunctive normal form,
78
from partial decision trees,
227–231
producing with covering algorithms,
223
replicated subtree,
76,
77f
ClassifierPerformanceEvaluator,
565,
567
ClassifierSubsetEval method,
334
classification error visualization,
559
Cleansing
artificial data generation,
321–322
Closed-world assumptions,
47,
78
comparing parametric, semiparametric and nonparametric density models,
362–363
with correlated attributes,
359–361
expectation maximization algorithm,
353–356
MDL principle application to,
200–201
and probability density estimation,
352–363
two-class mixture model,
354f
“Collapsed Gibbs sampling”,
380–381
Comma-separated value (CSV)
Complete-linkage method,
147
Computation graphs and complex network structures,
429–430
Computational learning theory,
489
Computational Network Toolkit (CNTK),
465
Computer-Assisted Passenger Prescreening System (CAPPS),
526
“Condensed” representation,
473
Conditional probability models,
392–403
gradient descent and second-order methods,
400
linear and polynomial regression,
392–393
multiclass logistic regression,
396–400
predictions for ordered classes,
402
using priors on parameters,
393–395
matrix vector formulations of linear and polynomial regression,
394–395
chain-structured conditional random fields,
410
linear chain conditional random fields,
408–409
from Markov random fields to,
407–408
Confidence
of association rules,
79,
120
Confidence limits
for normal distribution,
166t
for Student’s distribution,
174t
on success probability,
246
ConsistencySubsetEval method,
334
Constrained quadratic optimization,
254
Contact lens problem,
12–14
structural description,
14,
14f
Contrastive divergence,
452
Convolutional neural networks (CNNs),
419,
437–438
convolutional layers and gradients,
443–444
deep convolutional networks,
438–439
from image filtering to learnable convolutional layers,
439–443
pooling and subsampling layers and gradients,
444
Corrected resampled
t-test,
175–176
cost-sensitive classification,
182–183
cost-sensitive learning,
183
problem misidentification,
180
recall-precision curves,
190
Cost–benefit analyzer,
186
Cost-sensitive learning,
183
Coverage, of association rules,
79,
120
instance space during operation of,
115f
for ROC curve generation,
189
stratified threefold,
168
CrossValidationFoldMaker,
565,
567
Customer support/service applications,
28
D
noise,
structures for fast learning,
349–352
Data mining, , , , ,
28–30
as data analysis,
learning machine and,
4–9
Data preparation
See also Input
inaccurate values in,
63–64
partial least-squares regression,
307–309
principal components analysis,
305–307
algorithm adaptation for,
510,
510
tie-breaking strategy,
511
Data transformations,
285
discretization of numeric attributes,
287,
296–303
“Data understanding” phase,
28–29
Data-dependent expectation,
451
Decision tree induction,
30,
316
highly branching attributes,
110–113
partial, obtaining rules from,
227–231
with replicated subtree,
77f
DecisionStump algorithm,
490
DecisionTable algorithm,
334
Dedicated multi-instance methods,
475–476
computation graphs and complex network structures,
429–430
deep layered network architecture,
423–424
feedforward neural network,
424f
Deep layered network architecture,
423–424
software and network implementations,
464–466
three-layer perceptron,
419
training and evaluating deep networks,
431–437
data augmentation and synthetic transformations,
437
learning rates and schedules,
434–435
mini-batch-based stochastic gradient descent,
433–434
pseudocode for mini-batch based stochastic gradient descent,
434,
435f
regularization with priors on parameters,
435
unsupervised pretraining,
437
Denoising autoencoders,
448
Diagnosis applications,
25–26
Dimensionality reduction, PCA for,
377–378
Directed acyclic graphs,
340
Discrete attributes,
55–56
converting to numeric attributes,
303
decision tree learners,
296
Discretization-based calibration,
330
Discriminative learning,
449
Disjunctive normal form,
78
in assignment of key phrases,
516
in authorship ascription,
516
in language identification,
516
as supervised learning,
516
Double-consequent rules,
126
Dynamic Bayesian network,
405
E
“Elastic net” approach,
394
“Empirical Bayesian” methods,
368
Entity extraction, in text mining,
517
Entropy-based discretization,
298–301
error-based discretization versus,
301
with MDL stopping criterion,
301
Enumerating concept space,
32–33
Equal-frequency binning,
297
Equal-interval binning,
297
Error-based discretization,
301
Errors
inaccurate values and,
63–64
personal information and,
37–38
reidentification and,
36–37
Evaluation
Exclusive-or problem,
77f
Exhaustive error-correcting codes,
326
ExhaustiveSearch method,
496
to train Bayesian networks,
366–367
automatic parameter tuning,
171–172
metalearning algorithms,
558
Select Attributes panel,
562,
564
EXtensible Markup Language (XML),
57,
568
F
Feedforward networks,
269,
270
feedforward neural network,
424f
Fielded applications,
21–28
customer service/support,
28
decisions involving judgments,
22–23
manufacturing processes,
27–28
marketing and sales,
26–27
Files
FilteredClassifier algorithm,
563
FilteredClassifier metalearning scheme,
563
Filtering approaches,
319
Fisher’s linear discriminant analysis,
311–312
Forward stagewise additive modeling,
491
Forwards-backwards algorithms,
386
Frequent-pattern trees,
242
data preparation example,
236t
structure illustration,
239f
Functional dependencies,
513
G
Gaussian distributions,
373,
394
Gaussian process regression,
272
Generalization
instance-based learning and,
251
Generalization as search,
31–35
enumerating the concept space,
32–33
Generalized distance functions,
250
link functions, mean functions, and distributions,
401t
Generalized Sequential Patterns (GSP),
241
Global optimization, classification rules for,
226–227
and second-order methods,
400
computing using sum-product and max-product algorithms,
386–391
PCA for dimensionality reduction,
377–378
Graphics processing units (GPUs),
392
Greedy method, for rule pruning,
219
GreedyStepwise method,
334
Group-average clustering,
148
H
to train Bayesian networks,
366–367
example illustration,
153f
single-linkage algorithm,
147,
150
HierarchicalClusterer algorithm,
160
Highly branching attributes,
110–113
Histogram equalization,
297
Hyperparameter
measuring distance to,
250
in multi-instance learning,
477
HyperText Markup Language (HTML)
I
ID3 decision tree learner,
113
Identification code attributes,
95
hazard detection system,
23
ImageNet Large Scale Visual Recognition Challenge (ILSVRC),
438–439
Incremental learning,
567
Incremental reduced-error pruning,
225,
226f
IncrementalClassifierEvaluator,
567
Independent and identically distributed (i.i.d.),
338
Independent component analysis,
309–310
Inductive logic programming,
84
Informational loss function,
178–179
Information-based heuristics,
223
data transformations and,
304
Input layer, multilayer perceptrons,
263
Instance space
in covering algorithm operation,
115f
partitioning methods,
130f
rectangular generalizations in,
86–87
Instance-Based Learner version 3 (IB3),
246
in attribute selection,
291
explicit knowledge representation and,
251
reducing number of exemplars,
245
Instance-based representation,
84–87
data as clustering problem,
46t
decision boundary,
69,
70f
hierarchical clusterings,
153f
checking, of two consecutive sizes,
126
in efficient rule generation,
124–127
large, finding with association rules,
240–241
Iterated conditional modes procedure,
369–370
Iterative distance-based clustering,
142–144
J
cross-validation with,
565
Judgment decisions,
22–23
K
K2 learning algorithm,
347