Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

Data Mining

Index

Note: Page numbers followed by “f” and “t” refer to figures and tables, respectively.

0-9, and Symbols

0 − 1 loss function, 176

0.632 bootstrap, 170

1R (1-rule), 93

discretization, 296

example use, 94t

missing values and numeric data, 94–96

overfitting for, 95

pseudocode, 93f

11-point average recall, 191

Accuracy, of association rules, 79, 79, 120

minimum, 79, 122, 124

Accuracy, of classification rules, 102, 115

Activation functions, 270, 424–426, 425t

Acuity parameter, 152

AD trees, See All-dimensions (AD) trees

AdaBoost, 487–489

AdaBoost.M1 algorithm, 487

Additive logistic regression, 492–493

Additive regression, 490–493

ADTree algorithm, 501

Adversarial data mining, 524–527

Agglomerative clustering, 142, 147

Aggregation, 438

Akaike Information Criterion (AIC), 346

AlexNet model, 435

All-dimensions (AD) trees, 350–351

generation, 351

illustrated examples, 350f

Alternating decision trees, 495

example, 495f, 496

prediction nodes, 495

splitter nodes, 495

Analysis of variance (ANOVA), 393

Analyze panel, 568, 570–571

Ancestor-of relation, 51

AND, 262

Anomalies, detecting, 318–319

Antecedent, of rule, 75, 75

AODE, See Averaged one-dependence estimator (AODE)

Applications, 503

automation, 28

challenge of, 503

data stream learning, 509–512

diagnosis, 25–26

fielded, 21–28

incorporating domain knowledge, 512–515

massive datasets, 506–509

text mining, 515–519

Apriori algorithm, 234–235

Area under the curve (AUC), 191–192

Area under the precision-recall curve (AUPRC), 192

ARFF files, 57

attribute specifications in, 58

attribute types in, 58

defined, 57

illustrated, 58f

Arithmetic underflow, 344–345

Aspect model, 378–379

Assignment of key phrases, 516

Association learning, 44

Association rules, 11–12, 79–80
See also Rules

accuracy (confidence), 79, 120

characteristics, 79

computation requirement, 127

converting item sets to, 122

coverage (support), 79, 120

double-consequent, 125–126

examples, 11–12

finding, 120

finding large item sets, 240–241

frequent-pattern tree, 235–239

mining, 120–127

predicting multiple consequences, 79

relationships between, 80

single-consequent, 126

in Weka, 561

Attribute evaluation methods, 562

attribute subset evaluators, 564

single-attribute evaluators, 564

Attribute filters, 563

supervised, 563

unsupervised, 563

Attribute selection, 287, 288–295
See also Data transformations

backward elimination, 292–293

beam search, 293

best-first search, 293

filter method, 289–290

forward selection, 292–293

instance-based learning methods, 291

race search, 294

recursive feature elimination, 290–291

schemata search, 294

scheme-independent, 289–292

scheme-specific, 293–295

searching the attribute space and, 292–293

selective Naïve Bayes, 295

symmetric uncertainty, 291–292

in Weka, 562

Weka evaluation methods for, 562

wrapper method, 289–290

Attribute subset evaluators, 564

Attribute-efficient learners, 135

Attributes, 43, 53–54, 95

ARFF format, 58

Boolean, 55–56

causal relations, 513

combination of, 120

conversions, 94

date, 58

difference, 135–136

discrete, 55–56

evaluating, 94t

highly branching, 110–113

identification code, 95

interval, 55

irrelevant, 289

nominal, 54, 357

normalized, 61

numeric, 54, 210–212

ordinal, 55

ratio, 55

relations between, 83

relation-valued, 58

relevant, 289

semantic relation between, 513

string, 58, 313

string, conversion, 313

types of, 44, 61–62

values of, 53–54

weighting, 246–247

AUC, See Area under the curve (AUC)

AUPRC, See Area under the precision-recall curve (AUPRC)

Authorship ascription, 516

AutoClass, 156, 359

Bayesian clustering scheme, 359, 359–360

Autoencoders, 445–449

combining reconstructive and discriminative learning, 449

denoising autoencoders, 448

layerwise training, 448

pretraining deep autoencoders with RBMs, 448

Automation applications, 28

Averaged one-dependence estimator (AODE), 348–349

Average-linkage method, 147–148

Background knowledge, 508

Backpropagation, 263, 426–429

checking implementations, 430–431

stochastic, 268–269

Backward elimination, 292–293

Backward pruning, 213

Bagging, 480

algorithm for, 483f

bias-variance decomposition, 482–483

with costs, 483–484

idealized procedure versus, 483

instability neutralization, 482–483

for numeric prediction, 483

as parallel, 508

randomization versus, 485–486

Bagging algorithm, 480, 481–484

Bags, 156–157

class labels, 157

instances, joining, 474

positive, 475–476

positive probability, 476

Balanced iterative reducing and clustering using hierarchies (BIRCH), 160

Balanced Winnow, 134–135

Ball trees, 139

in finding nearest neighbors, 140

illustrated, 139f

nodes, 139–140

splitting method, 140–141

two cluster centers, 145f

Batch learning, 268–269

Batch normalization, 436

Bayes Information Criterion, 159–160

Bayes’ rule, 337, 339, 362–363

Bayesian clustering, 358–359

AutoClass, 359

DensiTree, 359, 360f

hierarchical, 359

Bayesian estimation and prediction, 367–370

probabilistic inference methods, 368–370

Bayesian Latent Dirichlet allocation (LDA^b), 379–380

Bayesian multinet, 349

Bayesian networks, 158, 339–352, 339–352, 382–385

AD tree, 350–351, 350f

algorithms, 347–349

conditional independence, 343–344

data structures for fast learning, 349–352

EM algorithm to, 366–367

example illustrations, 341f, 342f

for weather data, 341f, 342f

K2 algorithm, 411

learning, 344–347

making predictions, 340–344

Markov blanket, 347–348

predictions, 340–344

prior distribution over network structures, 346–347

specific algorithms, 347–349

structure learning by conditional independence tests, 349

TAN, 348

BayesNet algorithm, 416

Beam search, 293

Belief propagation, See Probability propagation

Bernoulli process, 165

BestFirst method, 334

Best-first search, 295

Bias, 33–35

language, 33–34

multilayer perceptron, 263

overfitting-avoidance, 35

search, 34–35

Bias-variance decomposition, 482–483

Binary classification problems, 69

Binary events, 337

BIRCH, See Balanced iterative reducing and clustering using hierarchies (BIRCH)

Bits, 106–107

Block Gibbs sampling, 454

Boltzmann machines, 449–451

Boolean attributes, 55–56

Boolean classes, 78

Boosting, 486–490

AdaBoost, 487–489

algorithm for, 487, 488f

classifiers, 490

in computational learning theory, 489

decision stumps, 490

forward stagewise additive modeling, 491

power of, 489–490

Bootstrap, 169–171

Bootstrap aggregating, See Bagging

Box kernel, 361

“Burn-in” process, 369

“Business understanding” phase, 28–29

C4.5, 113, 216, 219–220, 288–289

functioning of, 219

MDL-based adjustment, 220

C5.0, 221

Caffe, 465

Calibration, class probability, 330

discretization-based, 331

logistic regression, 330

PAV-based, 331

Capabilities class, 561

CAPPS, See Computer-Assisted Passenger Prescreening System (CAPPS)

CART system, 210, 283

cost-complexity pruning, 220–221

Categorical and continuous variables, 452–453

Categorical attributes, See Nominal attributes

Category utility, 142, 154–156

calculation, 154

incremental clustering, 150–154, 152–154

Causal relations, 513

CBA technique, 241

CfsSubsetEval method, 334

Chain rule, 327, 343–344

Chain-structured conditional random fields, 410

Circular ordering, 56

CitationKNN algorithm, 478

Class boundaries

non-axis parallel, 251

rectangular, 248–249, 249f

Class labels

bags, 157

reliability, 506

Class noise, 317

Class probability estimation, 321

dataset with two classes, 329, 329f

difficulty, 328–329

overoptimistic, 329

ClassAssigner component, 564–565

ClassAssigner filter, 564–565, 567

Classes, 45

Boolean, 78

membership functions for, 129

rectangular, 248–249, 249f

Classical machine learning techniques, 418

Classification, 44

clustering for, 468–470

cost-sensitive, 182–183, 484

document, 516

k-nearest-neighbor, 85

Naïve Bayes for, 103–104

nearest-neighbor, 85

one-class, 319

pairwise, 323

Classification learning, 44

Classification rules, 11–12, 75–78
See also Rules

accuracy, 224

antecedent of, 75

criteria for choosing tests, 221–222

disjunctive normal form, 78

with exceptions, 80–82

exclusive-or, 76, 77f

global optimization, 226–227

good rule generation, 224–226

missing values, 223–224

multiple, 78

numeric attributes, 224

from partial decision trees, 227–231

producing with covering algorithms, 223

pruning, 224

replicated subtree, 76, 77f

RIPPER rule learner, 227, 228f, 234

ClassifierPerformanceEvaluator, 565, 567

ClassifierSubsetEval method, 334

Classify panel, 558, 559, 559, 563

classification error visualization, 559

Cleansing

artificial data generation, 321–322

detecting anomalies, 318–319

improving decision trees, 316–317

one-class learning, 319–320

outlier detection, 320–321

robust regression, 317–318

“Cliques”, 385

Closed-world assumptions, 47, 78

CLOSET+ algorithm, 241

Clustering, 44, 141–156, 352–363, 473

agglomerative, 142, 147

algorithms, 87–88

category utility, 142

comparing parametric, semiparametric and nonparametric density models, 362–363

with correlated attributes, 359–361

document, 516

EM algorithm, 353–356

evaluation, 200

expectation maximization algorithm, 353–356

extending mixture model, 356–358

for classification, 468–470

group-average, 148

hierarchical, 147–148

in grouping items, 45

incremental, 150–154

iterative distance-based, 142–144

k-means, 144

MDL principle application to, 200–201

number of clusters, 146–147

using prior distributions, 358–359

and probability density estimation, 352–363

representation, 88f

statistical, 296

two-class mixture model, 354f

in Weka, 561

Cobweb algorithm, 142, 160, 561–562

Co-EM, 471

“Collapsed Gibbs sampling”, 380–381

Column separation, 325

Comma-separated value (CSV)

data files, 558

format, 558

Complete-linkage method, 147

Computation graphs and complex network structures, 429–430

Computational learning theory, 489

Computational Network Toolkit (CNTK), 465

Computer-Assisted Passenger Prescreening System (CAPPS), 526

Concept descriptions, 43

Concepts, 44–46
See also Input

defined, 43

“Condensed” representation, 473

Conditional independence, 343–344

Conditional probability models, 392–403

generalized linear models, 400–401

gradient descent and second-order methods, 400

using kernels, 402–403

linear and polynomial regression, 392–393

multiclass logistic regression, 396–400

predictions for ordered classes, 402

using priors on parameters, 393–395

matrix vector formulations of linear and polynomial regression, 394–395

Conditional random fields, 406–410

chain-structured conditional random fields, 410

linear chain conditional random fields, 408–409

from Markov random fields to, 407–408

for text mining, 410

Confidence

of association rules, 79, 120

intervals, 173–174

upper/lower bounds, 246

Confidence limits

in error rate estimation, 215–217

for normal distribution, 166t

for Student’s distribution, 174t

on success probability, 246

Confusion matrix, 181

Consequent, of rule, 75

ConsistencySubsetEval method, 334

Constrained quadratic optimization, 254

Contact lens problem, 12–14

covering algorithm, 115–119

rules, 13f

structural description, 14, 14f

Continuous attributes, See Numeric attributes

Contrastive divergence, 452

Convex hulls, 253

Convolution, 440, 441

Convolutional neural networks (CNNs), 419, 437–438

convolutional layers and gradients, 443–444

deep convolutional networks, 438–439

from image filtering to learnable convolutional layers, 439–443

ImageNet evaluation, 438–439

implementation, 445

pooling and subsampling layers and gradients, 444

Corrected resampled t-test, 175–176

Cost curves, 192–194

cost in, 193

cost matrixes, 182, 182t, 186

Cost of errors, 179–180

cost curves, 192–194

cost-sensitive classification, 182–183

cost-sensitive learning, 183

examples, 180

lift charts, 183–186

problem misidentification, 180

recall-precision curves, 190

ROC curves, 186–190

Cost–benefit analyzer, 186

Cost-complexity pruning, 220–221

Cost-sensitive classification, 182–183, 484

Cost-sensitive learning, 183

two-class, 183

Co-training, 470

EM and, 471

Counting the cost, 179–194

Covariance matrix, 356–357

Coverage, of association rules, 79, 120

minimum, 124

specifying, 127

Covering algorithms, 113–119

example, 115

illustrated, 113f

instance space during operation of, 115f

operation, 115

in producing rules, 223

in two-dimensional space, 113–114

CPU performance, 16

dataset, 16t

Cross-correlation, 440, 441

Cross-validation, 167–168, 432–433

estimates, 173

folds, 168

leave-one-out, 169

repeated, 175–176

for ROC curve generation, 189

stratified threefold, 168

tenfold, 168, 286–287

threefold, 168

CrossValidationFoldMaker, 565, 567

CSV, See Comma-separated value (CSV)

CuDNN, 465–466

Customer support/service applications, 28

Cutoff parameter, 154

Data, 38

augmentation, 437

evaluation phase, 29–30

linearly separable, 131–132

noise, 7

overlay, 57

scarcity of, 529

sparse, 60–61

structures for fast learning, 349–352

Data cleansing, 65, 288, 316–322
See also Data transformations

anomaly detection, 318–319

decision tree improvement, 316–317

methods, 288

one-class learning, 319–320

robust regression, 317–318

Data mining, 5, 5, 6, 9, 28–30

adversarial, 524–527

applying, 504–506

as data analysis, 5

ethics and, 35–38

learning machine and, 4–9

life cycle, 29f

scheme comparison, 172–176

ubiquitous, 527–529

Data preparation
See also Input

ARFF files, 57–60

attribute types, 61–62

data gathering in, 56–57

data knowledge and, 65

inaccurate values in, 63–64

missing values in, 62–63

sparse data, 60–61

Data projections, 287, 287, 304–314

partial least-squares regression, 307–309

principal components analysis, 305–307

random, 307

text to attribute vectors, 313–314

time series, 314

Data stream learning, 509–512

algorithm adaptation for, 510, 510

Hoeffding bound, 510

memory usage, 511–512

Naïve Bayes for, 510

tie-breaking strategy, 511

Data transformations, 285

attribute selection, 288–295

data cleansing, 288, 316–322

data projection, 287, 304–314

discretization of numeric attributes, 287, 296–303

input types and, 305

methods for, 287

multiple classes to binary ones, 288–289, 315–316

sampling, 288, 315–316

“Data understanding” phase, 28–29

Data warehousing, 56–57

Data-dependent expectation, 451

DataSet connections, 566–567

Date attributes, 58

Decimation, 438

Decision boundaries, 69

Decision lists, 11

rules versus, 119

Decision stumps, 490

Decision tree induction, 30, 316

complexity, 217–218

top-down, 221

Decision trees, 6, 70–71, 109f

alternating, 495, 495f, 496

C4.5 algorithm and, 219–220

constructing, 105–113

cost-complexity pruning, 220–221

for disjunction, 76f

error rate estimation, 215–217

examples, 14f, 18f

highly branching attributes, 110–113

improving, 316–317

information calculation, 108–110

missing values, 71, 212–213

nodes, 70–71

numeric attributes, 210–212

partial, obtaining rules from, 227–231

pruning, 213–215

with replicated subtree, 77f

rules, 219

in Weka, 558–559, 559

DecisionStump algorithm, 490

DecisionTable algorithm, 334

Dedicated multi-instance methods, 475–476

Deep belief networks, 455–456

Deep Boltzmann machines, 453–454

Deep feedforward networks, 420–431

activation functions, 424–426, 425t

backpropagation, 426–429

checking implementations, 430–431

computation graphs and complex network structures, 429–430

deep layered network architecture, 423–424

feedforward neural network, 424f

losses and regularization, 422–423

MNIST evaluation, 421–422, 421t

Deep layered network architecture, 423–424

Deep learning, 418

autoencoders, 445–449

deep feedforward networks, 420–431

recurrent neural networks, 456–460

software and network implementations, 464–466

stochastic deep networks, 449–456

techniques, 418

three-layer perceptron, 419

training and evaluating deep networks, 431–437

batch normalization, 436

cross-validation, 432–433

data augmentation and synthetic transformations, 437

dropout, 436

early stopping model, 431–432

hyperparameter tuning, 432–433

learning rates and schedules, 434–435

mini-batch-based stochastic gradient descent, 433–434

parameter initialization, 436–437

pseudocode for mini-batch based stochastic gradient descent, 434, 435f

regularization with priors on parameters, 435

unsupervised pretraining, 437

validation, 432–433

Deeplearning4j, 465

Delta, 314

Dendrograms, 87–88, 147

Denoising autoencoders, 448

Denormalization, 50

problems with, 51

DensiTree, 359, 360f

visualization, 359, 360f

Diagnosis applications, 25–26

faults, 25–26

machine language in, 25

performance tests, 26

Difference attributes, 135–136

Dimensionality reduction, PCA for, 377–378

Direct marketing, 27

Directed acyclic graphs, 340

Discrete attributes, 55–56

1R (1-rule), 296

converting to numeric attributes, 303

discretization, 287, 296–303
See also Data transformations

decision tree learners, 296

entropy-based, 298–301

error-based, 301

global, 296

partitioning, 94–95

proportional k-interval, 297–298

supervised, 297

unsupervised, 297

Discrete events, 337

Discretization-based calibration, 330

Discriminative learning, 449

Disjunctive normal form, 78

Distance functions, 135–136

difference attributes, 135–136

generalized, 250

for generalized exemplars, 248–250

missing values, 136

Diverse-density method, 475–476, 476, 476

Divide-and-conquer, 105–113, 289

Document classification, 516
See also Classification

in assignment of key phrases, 516

in authorship ascription, 516

in language identification, 516

as supervised learning, 516

Document clustering, 516

Domain knowledge, 19

Double-consequent rules, 126

Dropout, 436

Dynamic Bayesian network, 405

Early stopping, 266, 267–268, 268

model, 431–432

Eigenvalues, 306

Eigenvectors, 306

“Elastic net” approach, 394

EM algorithm, 416

EM for PPCA, 375–376

END algorithm, 334

“Empirical Bayesian” methods, 368

Empirical risk, 422–423

Ensemble learning, 479

additive regression, 490–493

bagging, 481–484

boosting, 486–490

interpretable ensembles, 493–497

multiple models, 480–481

randomization, 484–486

stacking, 497–499

Entity extraction, in text mining, 517

Entropy, 110

Entropy-based discretization, 298–301

error-based discretization versus, 301

illustrated, 299f

with MDL stopping criterion, 301

results, 299f

stopping criteria, 293, 300

Enumerated, 55–56

Enumerating concept space, 32–33

Equal-frequency binning, 297

Equal-interval binning, 297

Error rate, 163

decision tree, 215–217

repeated holdout, 167

success rate and, 215–216

training set, 163

Error-based discretization, 301

Errors

estimation, 172

inaccurate values and, 63–64

mean-absolute, 195

mean-squared, 195

propagation, 266, 267–268, 268

relative-absolute, 195

relative-squared, 195–196

resubstitution, 163

squared, 177

training set, 163

Estimation error, 172

Ethics, 35–38

issues, 35

personal information and, 37–38

reidentification and, 36–37

Euclidean distance, 135

between instances, 149

function, 246–247

Evaluation

clustering, 200–201

as data mining key, 161–162

numeric prediction, 194–197

performance, 162

Examples, 46–53
See also Instances
specific examples

class of, 45

relations, 47–51

structured, 51

types of, 46–53

Exceptions, rules with, 80–82, 231–233

Exclusive-or problem, 77f

Exclusive-OR (XOR), 262

Exemplars, 245

generalizing, 247–248

noisy, pruning, 245–246

reducing number of, 245

Exhaustive error-correcting codes, 326

ExhaustiveSearch method, 496

Expectation, 357

Expectation maximization (EM) algorithm, 353–356, 355, 356, 365–366, 468

and cotraining, 471

maximization step, 469

with Naïve Bayes, 469

to train Bayesian networks, 366–367

Expected gradients, 364–365

for PPCA, 375

Expected log-likelihoods, 364–365

for PPCA, 374

Experimenter, 554, 568–571
See also Weka workbench

advanced setup, 570

Analyze panel, 568–570, 570–571

results analysis, 569–570

Run panel, 568

running experiments, 568–569

Setup panel, 568, 571

simple setup, 570

starting up, 568–570

Expert models, 480

Explorer, 554, 557–564
See also Weka workbench

ARFF format, 560

Associate panel, 561–562

association-rule learning, 234–241

attribute selection, 564

automatic parameter tuning, 171–172

Classify panel, 558, 558

Cluster panel, 561

clustering algorithms, 141–156

CSV data files, 558

decision tree building, 558–559

filters, 560–561, 563

introduction to, 557–564

J48, 558–559

learning algorithms, 563

loading datasets, 557–558, 558, 560–561

metalearning algorithms, 558

models, 559

Preprocess panel, 559, 560, 560

search methods, 564

Select Attributes panel, 562, 564

Visualize panel, 553, 562

EXtensible Markup Language (XML), 57, 568

Factor analysis, 373

Factor graphs, 382–385

Bayesian networks, 382–385

logistic regression model, 382–385

Markov blanket, 383f

False negatives (FN), 180–181, 182, 191t

False positive rate, 180–181

False positives (FP), 180–181, 182, 191t

Familiar system, 528

Feature map, 439–440

Feature selection, 331–333

Feedforward networks, 269, 270

feedforward neural network, 424f

Fielded applications, 21–28

automation, 28

customer service/support, 28

decisions involving judgments, 22–23

diagnosis, 22–23

image screening, 23–24

load forecasting, 24–25

manufacturing processes, 27–28

marketing and sales, 26–27

scientific, 28

web mining, 21–22

File mining, 53

Files

ARFF, 58, 59–60

filtering, 560–561

loading, 560–561

opening, 560

Filter method, 289–290

FilteredClassifier algorithm, 563

FilteredClassifier metalearning scheme, 563

Filtering approaches, 319

Filters, 554, 563

applying, 561

attribute, 562, 563, 564

information on, 561

instance, 563

supervised, 563, 563, 567

unsupervised, 563, 563, 567

in Weka, 559

Finite mixtures, 353

Fisher’s linear discriminant analysis, 311–312

Fixed set, 54, 510

Flat files, 46–47

F-measure, 191, 202–203

Forward pruning, 213

Forward selection, 292–293, 293

Forward stagewise additive modeling, 491

implementation, 492

numeric prediction, 491–492

overfitting and, 491–492

residuals, 491

Forwards-backwards algorithms, 386

FP-growth algorithm, 235, 241

Frequent-pattern trees, 242

building, 235–239

compact structure, 235

data preparation example, 236t

header tables, 237

implementation, 241

structure illustration, 239f

support threshold, 240

Functional dependencies, 513

Functional trees, 71–72

Fundamental rule of probability, See Product rule

Gain ratio, 111–112

Gaussian distributions, 373, 394

Gaussian kernel, 361

Gaussian process regression, 272

Generalization

exemplar, 247–248, 251–252

instance-based learning and, 251

stacked, 497–499

Generalization as search, 31–35

bias, 33–35

enumerating the concept space, 32–33

Generalized distance functions, 250

Generalized linear models, 400–401

link functions, mean functions, and distributions, 401t

Generalized Sequential Patterns (GSP), 241

Generalizing exemplars, 247–248

distance functions for, 248–250

nested, 248

Generative models, 371

Gibbs sampling, 368–369, 369

Global optimization, classification rules for, 226–227

Gradient ascent, 476

Gradient clipping, 457–458

Gradient descent, 266, 267–268, 268

illustrated, 265f

and second-order methods, 400

stochastic, 270–272

subgradients, 270–271

Graphical models, 352, 370–391

computing using sum-product and max-product algorithms, 386–391

factor graphs, 382–385

LDA, 379–381

LSA, 376–377

Markov random fields, 385–386

PCA for dimensionality reduction, 377–378

and plate notation, 371

PPCA, 372–376

probabilistic LSA, 378–379

Graphics processing units (GPUs), 392

GraphViewer, 565

Greedy method, for rule pruning, 219

GreedyStepwise method, 334

Group-average clustering, 148

Growing sets, 224

GSP, See Generalized Sequential Patterns (GSP)

Hamming distance, 325

Hausdorff distance, 475, 477

Hidden attributes, 340

Hidden layer, multilayer perceptrons, 263, 266, 267–268, 267f, 268

Hidden Markov models, 404–405

Hidden variable models, 363–367

EM algorithm, 365–366

to train Bayesian networks, 366–367

expected gradients, 364–365

expected log-likelihoods, 364–365

Hidden variables, 355–356, 363

Hierarchical clustering, 147–148, 359
See also Clustering

agglomerative, 147

average-linkage method, 147–148

centroid-linkage method, 147–148

dendrograms, 147

displays, 149f

example, 148–150

example illustration, 153f

group-average, 148

single-linkage algorithm, 147, 150

HierarchicalClusterer algorithm, 160

Highly branching attributes, 110–113

Hinge loss, 271, 271f

Histogram equalization, 297

Hoeffding bound, 510

Hoeffding trees, 510, 510

HTML, See HyperText Markup Language (HTML)

Hyperparameter

selection, 171–172

tuning, 432–433

Hyperplanes, 252–253

maximum-margin, 253–254

separating classes, 253f

Hyperrectangles, 247–248

boundaries, 247–248

exception, 248

measuring distance to, 250

in multi-instance learning, 477

overlapping, 248

Hyperspheres, 139

HyperText Markup Language (HTML)

delimiters, 519–520

formatting commands, 519

IB1 algorithm, 160

IB3, See Instance-Based Learner version 3 (IB3)

IBk algorithm, 284

Id3 algorithm, 160

ID3 decision tree learner, 113

Identification code attributes, 95

example, 111t

Image screening, 23–24

hazard detection system, 23

input, 23–24

problems, 24

ImageNet evaluation, 438–439

ImageNet Large Scale Visual Recognition Challenge (ILSVRC), 438–439

Inaccurate values, 63–64

Incremental clustering, 150–154

acuity parameter, 152–154

category utility, 150, 151

cutoff parameter, 154

example illustrations, 151f, 153f

merging, 151–152

splitting, 152

Incremental learning, 567

Incremental reduced-error pruning, 225, 226f

IncrementalClassifierEvaluator, 567

Independent and identically distributed (i.i.d.), 338

Independent component analysis, 309–310

Inductive logic programming, 84

Information, 37–38, 106–107

calculating, 108–110

extraction, 517–518

gain calculation, 222

measure, 108–110

value, 110

Informational loss function, 178–179

Information-based heuristics, 223

Input, 43

aggregating, 157

ARFF format, 57–60

attribute types, 61–62

attributes, 53–56

concepts, 44–46

data assembly, 56–57

data transformations and, 304

examples, 46–53

flat files, 46–47

forms, 43

inaccurate values, 63–64

instances, 46–53

missing values, 62–63

preparing, 56–65

sparse data, 60–61

tabular format, 127

Input layer, multilayer perceptrons, 263

Instance connections, 566–567

Instance filters, 563

Instance space

in covering algorithm operation, 115f

partitioning methods, 130f

rectangular generalizations in, 86–87

Instance-Based Learner version 3 (IB3), 246

Instance-based learning, 84–85, 135–141

in attribute selection, 291

characteristics, 84–85

distance functions, 135–136

for generalized exemplars, 248–250

explicit knowledge representation and, 251

generalization and, 244–252

generalizing exemplars, 247–248

nearest-neighbor, 136–141

performance, 245–246

pruning noise exemplars, 245–246

reducing number of exemplars, 245

visualizing, 87

weighting attributes, 246–247

Instance-based representation, 84–87

Instances, 43, 46–47

centroid, 142–143

misclassified, 132–133

with missing values, 212–213

multilabeled, 45

order, 59–60

sparse, 61

subset sort order, 212

training, 198

Interpretable ensembles, 493–497

logistic model trees, 496–497

option trees, 494–496

Interval quantities, 55

Iris example, 14–15

data as clustering problem, 46t

dataset, 15t

decision boundary, 69, 70f

decision tree, 72, 73f

hierarchical clusterings, 153f

incremental clustering, 150–154

rules, 15

rules with exceptions, 80–82, 81f, 231–233, 232f

Isotonic regression, 330

Item sets, 120–121

checking, of two consecutive sizes, 126

converting to rules, 122

in efficient rule generation, 124–127

example, 121t

large, finding with association rules, 240–241

minimum coverage, 124

subsets of, 124–125

Items, 120

Iterated conditional modes procedure, 369–370

Iterative distance-based clustering, 142–144

J48 algorithm, 558, 565, 567, 568

cross-validation with, 565

Java virtual machine, 508–509

Joint distribution, 367, 452–453

Judgment decisions, 22–23

K2 algorithm, 411

K2 learning algorithm, 347

Kappa statistic, 181

KD-trees, 136

building, 137

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Index

Create new playlist

Sign In

Sign Up

Index

Table of Contents for
Index