Subject Index

adaptive MCMC, 308

adding parameters to a model, 185–186

adequate summary, 217, 232

adolescent smoking survey, 148–150

AECM algorithm, 324, 348

airline fatalities, 59, 82

Akaike information criterion (AIC), 172, 177

discussion, 182

educational testing example, 179

Alcoholics Anonymous survey, 213–214

aliasing, 89, 521, 524, 533

all-at-once Gibbs sampler for hierarchical regression, 393

alternating conditional sampling, see Gibbs sampler

analysis of variance (Anova), 114, 395–398, 402, 403

finite-population and superpopulation models, 396–397

fixed and random effects, 396–397

for hierarchical logistic regression, 423

internet example, 398

notation, 395

ancillary test statistic, 151

approximate Bayesian computation (ABC), 344

approximate inference, 263

approximations based on posterior modes, 311–330

asymptotic theorems, 87–88, 206, 215–216, 232

counterexamples, 89–91

proofs, 585–588

Australia, adolescent smoking survey in, 148–150, 162, 211–212

auxiliary variables for computation, 297–299, 309

basis functions, 487–499

Gaussian, 487

multivariate, 495–498

selection and shrinkage, 490–494

splines, 488–490

Bayes factor, 182–184, 193

Bayes’ rule, 6, 20

discrete examples, 8–11, 245

original example, 30

Bayesian data analysis, three steps of, 3

Bayesian filtering and smoothing, 516

Behrens—Fisher problem, 80

belief functions, 98

Bernoulli distribution, 584

Bernoulli trials, 29, 147

beta distribution, 30, 34, 60, 578, 582

beta-binomial distribution, 578, 584

as overdispersed alternative to binomial, 438

beta-blockers, example of meta-analysis, 124–128

betting and probability, 13–16

bias, 94, 99

compared to miscalibration, 128

difficulties with the notion, 94

prediction vs. estimation, 401

prediction vs. estimation, 94

‘biased-coin’ design, 235

BIC (‘Bayesian’ information criterion), 175

bicycle traffic, 81, 136

binned residual plots, 157–158

binomial distribution, 578, 583

binomial model, 29, 37, 80

posterior predictive check, 147–148

bioassay experiment, 74–79, 82

normal approximation, 86

birthdays, example of Gaussian process modeling, 505–510

births, proportion girls, 37–39

blockwise Gibbs sampler for hierarchical regression, 392

bootstrap, 96

Box-Cox transformation, 188–191, 194–195

bridge sampling, 347, 348

Bugs, see software

burn-in for MCMC, why we prefer the term warm-up, 282

business school grades, hierarchical multivariate regression, 391–392

calibration, 128

calibration of probability estimates, 16–19

cancer maps, 47–51

capture-recapture sampling, 233

Cauchy model, 59, 437

causal inference, 4, 214, 223–224

instrumental variables, 224

observational studies, 220

incumbency example, 358–364

principal stratification, 223–224

randomized experiments, 214, 231

using regression models, 365

cavity distribution in expectation propagation, 339

censored data, 61, 224–228

Census, 422, 466

record linkage, 16–19

central composite design integration (CCD), 344

central posterior interval, 33, 60

chess competition, example of paired comparisons, 427

-2 distribution, 576, 581

Chinese restaurant process, 550

Chloride example, 489, 492, 494

Cholesky factor (matrix square root), 356, 580

classical methods

confidence intervals, 92, 95

frequency evaluations, 91

hypothesis testing, 145, 150

maximum likelihood, 93

multiple comparisons, 96

nonparametric inference, 96

normal-theory inference, 83

point estimation, 85, 91

standard errors, 85

unbiased estimates, 94

Wilcoxon rank test, 97

cluster sampling, 210–212, 232

coarse data, 230

cockroach allergen data, 472

coefficient of variation (CV), 6

coherence, 13

coin tossing, 12, 26

collinearity, 365

colochos, xiv

complementary log-log link, 407

complete data, 199

complete-data likelihood, 200

completely randomized experiments, 214–216

computation, see posterior simulation

computer programs, see software

conditional maximization, see posterior modes

conditional posterior distribution, 122, 325

conditionally conjugate prior distributions, 129, 130, 280, 315, 332, 503, 553

confidence intervals, 3, 92, 95

conjugacy, see prior distribution

conjugate gradient optimization, 313

consistency, 88, 91

contingency tables, 428–431

with missing data, 462

continuous models for discrete data, 458

contour plots, 76, 111, 112

and normal approximation, 85

control variable, 353

convergence of iterative simulation, 281–286

ovariance matrix, 20, 71

for a Gaussian process, 501

for a sum of Gaussian processes, 506

inverse-Wishart distribution, 72–74, 390, 576, 582

literature on models for, 401

LKJ distribution, 576, 582

scaled inverse-Wishart model, 74, 390

Wishart distribution, 576, 582

covariates, see regression models, explanatory variables

cow feed experiment, 217–218, 379

cross-validation, 175–177

discussion, 182

educational testing example, 179

crude estimation, 76, 263

bioassay example, 76

educational testing example, 114

rat tumor example, 103

schizophrenia example, 523, 526

Slovenia survey, 463

curse of dimensionality, 495

CV, coefficient of variation, 6

data augmentation, 293

data collection, 197–236

censored and truncated data, 224–228

experiments, 214–220

formal models, 199–202

ignorability, 202–205

observational studies, 220–224

randomization, 218–220

sample surveys, 205–214

data distribution, 6

data reduction, 85

de Finetti’s theorem, 105, 134

debugging, 270–271

comparing inferences from several models, 482

EM algorithm, 321

in Stan and R, 605–606

decision analysis, 12, 26, 99, 237–258

and Bayesian inference, 237–239

medical screening example, 245–246

personal and institutional perspectives, 256

radon example, 246–256

survey incentives example, 239–244

utility, 238, 245, 248, 256

decision trees, 238, 245, 252

degrees of freedom, 43, 437, 442

delta method, 99

density regression, 568–571

dependent Dirichlet process (DDP), 562–564, 572

derivatives, computation of, 313

design of surveys, experiments, and observational studies, 197–236

designs that ‘cheat’, 219

deviance, 192

deviance information criterion (DIC), 172–173, 177, 192

discussion, 182

educational testing example, 179

differences between data and population, 207, 221, 223, 237, 422

differential equation model in toxicology, 477–485

dilution assay, example of a nonlinear model, 471–476, 485

dimensionality, curse of, 495

Dirichlet distribution, 69, 578, 583

Dirichlet process, 545–574

Dirichlet process mixtures, 549–557

discrepancy measure, 145

discrete data

adapting continuous models, 458

latent-data formulation, 408

logistic regression, 406

multinomial models, 423–428

Poisson regression, 406

probit regression, 406

discrete probability updating, 9, 245

dispersion parameter for generalized linear models, 405

distinct parameters and ignorability, 202

distribution, 575–584

Bernoulli, 584

beta, 30, 34, 60, 578, 582

beta-binomial, 60, 578, 584

binomial, 578, 583

Cauchy, 98

-2, 576, 581

Dirichlet, 69, 578, 583

double exponential, 368, 493, 576

exponential, 576, 581

gamma, 45, 576, 581

Gaussian, see normal distribution inverse-2, 576, 581

inverse-gamma, 43, 576, 581

inverse-Wishart, 72, 576, 582

Laplace, 368, 493, 576

LKJ correlation, 576, 582

log-logistic, 578

logistic, 578

lognormal, 576, 580

long-tailed, 435

multinomial, 578, 584

multivariate normal, 79, 576, 580

marginals and conditionals, 580

multivariate t, 319, 578

negative binomial, 44, 132, 578, 584

normal, 575, 576

normal-inverse-2, 67, 82

Pareto, 493

Poisson, 578, 583

scaled inverse-2, 43, 65, 576, 581

t, 66, 578, 582

uniform, 575, 576

Weibull, 576, 581

Wishart, 576, 582

divorce rates, 105, 135

dog metabolism example, 380

dose-response relation, 74

double exponential distribution, 368, 493, 576

Eold, 320

ECM and ECME algorithms, 323, 348

educational testing experiments, see SAT coaching experiments

effective number of parameters, 169–182

educational testing example, 179

effective number of simulation draws, 286–288

efficiency, 91

eight schools, see SAT coaching experiments

elections

forecasting presidential elections, 165–166, 171–172, 383–388

incumbency in U.S. Congress, 358–364

polling in Slovenia, 463–466

polling in U.S., 422–423, 456–462

probability of a tie, 27

EM algorithm, 320–325

AECM algorithm, 324

as special case of variational inference, 337

debugging, 321

ECM and ECME algorithms, 323, 348

for missing-data models, 452

parameter expansion, 325, 348

SEM and SECM algorithms, 324–325

empirical Bayes, why we prefer to avoid the term, 104

environmental health

allergen measurements, 472

perchloroethylene, 477

radon, 246

EP, see expectation propagation

estimands, 4, 24, 267

exchangeable models, 5, 26, 104–108, 230

and explanatory variables, 5

and ignorability, 230

no conflict with robustness, 436

objections to, 107, 126

universal applicability of, 107

expectation propagation, 338–343

cavity distribution, 339

extensions, 343

logistic regression example, 340–343

moment matching, 339

picture of, 342

tilted distribution, 339

experiments, 214–220

completely randomized, 214–216

definition, 214

distinguished from observational studies, 220

Latin square, 216

randomization, 218–220

randomized block, 216

sequential, 217, 235

explanatory variables, see regression models

exponential distribution, 576, 581

exponential families, 36, 338

exponential model, 46, 61

external validation, 142, 167

record linkage example, 17

toxicology example, 484

factorial analysis, internet example, 397–398

Federalist papers, 447

finite-population inference, 200–203, 205–209, 212, 214–216, 232

in Anova, 396–397

Fisher information, 88

fixed effects, 383

and finite-population models in Anova, 397

football point spreads, 13–16, 26, 27

forecasting presidential elections, 142, 383–388

hierarchical model, 386

problems with ordinary linear regression, 385

frequency evaluations, 91–92, 98

frequentist perspective, 91

functional data analysis, 512–513

gamma distribution, 45, 576, 581

Gaussian distribution, see normal distribution

Gaussian processes, 501–517

birthdays example, 505–510

golf putting, 517

latent, 510–512

logistic, 513–515

gay marriage data, 499

generalized linear models, 405–434

computation, 409–412

hierarchical, 409

hierarchical logistic regression, 422–423

hierarchical Poisson regression, 420–422

overdispersion, 407, 431, 433

prior distribution, 409

simple logistic regression example, 74–78

genetics, 8, 183

simple example of Bayesian inference, 8–9, 27

geometric mean (GM), 6

geometric standard deviation (GSD), 6

Gibbs sampler, 276–278, 280–281, 291

all-at-once for hierarchical regression, 393

assessing convergence, 281–286

blockwise for hierarchical regression, 392

efficiency, 293–295

examples, 289, 440, 465, 528

hierarchical linear models, 288–290, 392–394, 396

parameter expansion for hierarchical regression, 393, 396

picture of, 277

programming in R, 596–606

special case of Metropolis-Hastings algorithm, 281

girl births, proportion of, 37–39

global mode, why it is not special, 311

GM (geometric mean), 6

golf putting

Gaussian process, 517

nonlinear model for, 486, 499

goodness-of-fit testing, see model checking

graphical models, 133

graphics

examples of use in model checking, 143, 144, 154–158

jittering, 14, 15, 27

posterior predictive checks, 153–159

grid approximation, 76–77, 263

GSD (geometric standard deviation), 6

Hamiltonian (hybrid) Monte Carlo, 300–307, 601–605

hierarchical model example, 305–307, 601–605

leapfrog algorithm, 301

mass matrix, 301

momentum distribution, 301

no U-turn sampler, 304

programming in R, 601–605

tuning, 303

heteroscedasticity in linear regression, 369–376

parametric model for, 372

hierarchical Dirichlet process (HDP), 564–566

hierarchical linear regression, 381–404

computation, 392–394, 396

interpretation as a single linear regression, 389

hierarchical logistic regression, 422–423

hierarchical models, 5, 101–137, 381–404

analysis of variance (Anova), 395

binomial, 109–113, 136

bivariate normal, 209–210

business school grades, 391–392

cluster sampling, 210–212

computation, 108–113

forecasting elections, 383–388

logistic regression, 422–423

many batches of random effects

election forecasting example, 386

polling example, 422–423

meta-analysis, 124–128, 423–425

multivariate, 390–392, 423–425, 456–462

no unique way to set up, 389

normal, 113–128, 288–290, 326–330

NYPD stops, 420–422

pharmacokinetics example, 480–481

Poisson, 137, 420–422

pre-election polling, 209–210

prediction, 108, 118

prior distribution, see hyperprior distribution

radon, 246–256

rat tumor example, 109–113

SAT coaching, 119–124

schizophrenia example, 524–533

stratified sampling, 209–210

survey incentives, 239–244

hierarchical Poisson regression, 420–422

hierarchical regression, 381–404

prediction, 387

highest posterior density interval, 33, 57, 60

HMC, see Hamiltonian Monte Carlo

horseshoe prior distribution for regression coefficients, 378

hybrid Monte Carlo, see Hamiltonian Monte Carlo

hyperparameter, 35, 101, 105

hyperprior distribution, 107–108

informative, 480–481

noninformative, 108, 110, 111, 115, 117, 135, 424, 526

hypothesis testing, 145, 150

identifiability, 365

ignorability, 202–205, 230, 450

and exchangeability, 230

incumbency example, 359

strong, 203

ignorable and known designs, 203

ignorable and known designs given covariates, 203

ignorable and unknown designs, 204

iid (independent and identically distributed), 5

ill-posed systems

differential equation model in toxicology, 477–485

mixture of exponentials, 486

importance ratio, 264

importance resampling (sampling- importance resampling, SIR), 266, 271, 273, 319

examples, 441, 442

why you should sample without replacement, 266

importance sampling, 265, 271

bridge sampling, 347, 348

for marginal posterior densities, 440

path sampling, 347–348

unreliability of, 265

improper posterior distribution, see posterior distribution

improper prior distribution, see prior distribution

imputation, see multiple imputation

inclusion indicator, 200, 449

incumbency advantage, 358–364

two variance parameters, 374

indicator variables, 366

for mixture models, 519

inference

discrete examples, 8–11

one of the three steps of Bayesian data analysis, 3

inference, finite-population and superpopulation, 201–202, 212, 214

completely randomized experiments, 215–216, 232

in Anova, 396–397

pre-election polling, 208–209

simple random sampling, 205–206

information criteria, 169–182

information matrix, 84, 88

informative prior distribution

alternative to selecting regression variables, 367–369

spell checking example, 10

toxicology example, 480

institutional decision analysis, 256

instrumental variables, 224

integrated nested Laplace approximation (INLA), 343

intention-to-treat effect, 224

interactions

in basis-function models, 497

in Gaussian processes, 504, 511

in loglinear models, 429

in regression models, 242, 367

internet connect times, 397–398

intraclass correlation, 382

inverse cdf for posterior simulation, 23

inverse probability, 56

inverse-2 distribution, 576, 581

inverse-gamma distribution, 43, 576, 581

inverse-Wishart distribution, 72, 576, 582

iterative proportional fitting (IPF), 430–431

iterative simulation, see Markov chain Monte Carlo, 293–310

iterative weighted least squares (EM for robust regression), 444

jackknife, 96

Jacobian, 22

Jeffreys’ rule for noninformative prior distributions, 52–53, 57, 59

jittering, 14, 15, 27

joint posterior distribution, 63

Kullback-Leibler divergence, 88, 331–336, 585–587

connection to deviance, 192

label switching in mixture models, 533

Laplace distribution, 368, 493, 576

Laplace’s method for numerical integration, 318, 348

large-sample inference, 83–92

lasso (regularized regression), 368–369, 379

latent continuous models for discrete data, 408

latent-variable regression, 515

Latin square experiment, 216–217

LD50, 77–78

leapfrog algorithm for Hamiltonian Monte Carlo, 301

leave-one-out cross-validation, 175–177

discussion, 182

educational testing example, 179

life expectancy, quality-adjusted, 245

likelihood, 7–10

complete-data, 200

observed-data, 201

likelihood principle, 8, 26

misplaced appeal to, 198

linear regression, 353–380, see also regression models

t errors, 444–445

analysis of residuals, 361

classical, 354

conjugate prior distribution, 376–378

as augmented data, 377

correlated errors, 369–376

errors in x and y, 379, 380

heteroscedasticity, 369–376

parametric model for, 372

hierarchical, 381–404

interpretation as a single linear regression, 389

incumbency example, 358–364

known covariance matrix, 370

model checking, 361

posterior simulation, 356

prediction, 357, 364

with correlations, 371

residuals, 358, 362

robust, 444–445

several variance parameters, 369–376

weighted, 372

link function, 405, 407

LKJ correlation distribution, 576, 582

location and scale parameters, 54

log densities, 261

log-logistic distribution, 578

logistic distribution, 578

logistic regression, 74–78, 406

for multinomial data, 423

hierarchical, 422–423

latent-data interpretation, 408

logit (logistic, log-odds) transformation, 22, 125

loglinear models, 428–431

prior distributions, 429

lognormal distribution, 576, 580

longitudinal data

survey of adolescent smoking, 211–212

maps

artifacts in, 47–51, 57

cancer rates, 47–51

for model checking, 143

MAR (missing at random), 202, 450

a more reasonable assumption than MCAR, 450

marginal and conditional means and variances, 21

marginal posterior distribution, 63, 110, 111, 122, 261

approximation, 325–326

computation for the educational testing example, 594–595

computation using importance sampling, 440

EM algorithm, 320–325

Markov chain, 275

Markov chain Monte Carlo (MCMC), 275–310

adaptive algorithms, 297

assessing convergence, 281–286

between/within variances, 283

simple example, 285

auxiliary variables, 297–299, 309

burn-in, why we prefer the term warm-up, 282

data augmentation, 293

effective number of simulation draws, 286–288

efficiency, 280, 293–296

Gibbs sampler, 276–278, 280–281, 291

assessing convergence, 281–286

efficiency, 293–295

examples, 277, 289, 392, 440, 465, 528

picture of, 277

programming in R, 596–606

Hamiltonian (hybrid) Monte Carlo, 300–307, 309

hierarchical model example, 305–307

leapfrog algorithm, 301

mass matrix, 301

momentum distribution, 301

no U-turn sampler, 304

tuning, 303

inference, 281–286

Metropolis algorithm, 278–280, 291

efficient jumping rules, 295–297

examples, 278, 290

generalizations, 293–300

picture of, 276

programming in R, 598–599

relation to optimization, 279

Metropolis-Hastings algorithm, 279, 291

generalizations, 293–300

multiple sequences, 282

output analysis, 281–288

overdispersed starting points, 283

parallel tempering, 299–300

perfect simulation, 309

regeneration, 309

reversible jump sampling, 297–299, 309

simulated tempering, 309

slice sampling, 297, 309

thinning, 282

trans-dimensional, 297–299, 309

warm-up, 282

matrix and vector notation, 4

maximum entropy, 57

maximum likelihood, 93

MCAR (missing completely at random), 450

measurement error models

hierarchical, 133

linear regression with errors in x and y, 380

nonlinear, 471–476

medical screening, example of decision analysis, 245–246

meta-analysis, 133, 137

beta-blockers study, 124–128, 423–425

bivariate model, 423–425

goals of, 125

survey incentives study, 239–242

Metropolis algorithm, 278–280, 291

efficient jumping rules, 295–297

examples, 278, 290

generalizations, 293–300

picture of, 276

programming in R, 598–599

relation to optimization, 279

Metropolis-Hastings algorithm, 279, 291

generalizations, 293–300

minimal analysis, 217

missing at random (MAR), 202, 450

a more reasonable assumption than MCAR, 450

a slightly misleading phrase, 202

missing completely at random (MCAR), 450

missing data, 449–467

and EM algorithm, 452, 454

intentional, 198

monotone pattern, 453, 455, 459–462

multinomial model, 462

multivariate normal model, 454–456

multivariate t model, 456

notation, 199, 449–452

paradigm for data collection, 199

Slovenia survey, 463–466

unintentional, 198, 204, 449

mixed-effects model, 382

mixture models, 17, 20, 105, 135, 519–543

computation, 523–524

continuous, 520

de Finetti’s theorem and, 105

Dirichlet process, 549–557

discrete, 519

exponential distributions, 486

hierarchical, 525

label switching, 533

model checking, 531, 532

prediction, 530

schizophrenia example, 524–533

mixture of exponentials, as example of an ill-posed system, 478, 486

model, see also hierarchical models, regression models, etc.

beta-binomial, 438

binomial, 29, 37, 80, 147

Cauchy, 59, 437

Dirichlet process, 545

exponential, 46, 61

lognormal, 188

multinomial, 69, 79, 423–428

multivariate normal, 70

negative binomial, 437, 446

nonlinear, 471–486

normal, 39, 41, 42, 60, 64–69

overdispersed, 437–439

Poisson, 43, 45, 59, 61

Polya urn, 549

robit, 438

robust or nonrobust, 438–439

t, 293, 437, 441–445

underidentified, 89

model averaging, 193, 297

model building, one of the three steps of Bayesian data analysis, 3

model checking, 141–164, 187–195

adolescent smoking example, 148–150

election forecasting example, 142, 386

incumbency example, 361

one of the three steps of Bayesian data analysis, 3

power transformation example, 189

pre-election polling, 210

psychology examples, 154–157

residual plots, 158, 476, 484

SAT coaching, 159–161

schizophrenia example, 531, 532

speed of light example, 143, 146

spelling correction example, 11

toxicology example, 483

model comparison, 178–184

model complexity, see effective number of parameters

model expansion, 184–192

continuous, 184, 372, 439

schizophrenia example, 531–532

model selection

bias induced by, 181

why we reluctantly do it, 178, 183–184, 367

moment matching in expectation propagation, 339

momentum distribution for Hamiltonian Monte Carlo, 301

monitoring convergence of iterative simulation, 281–286

monotone missing data pattern, 453, 455, 459–462

Monte Carlo error, 267, 268, 272

Monte Carlo simulation, 267–310

multilevel models, see hierarchical models

multimodal posterior distribution, 299, 319

multinomial distribution, 578, 584

multinomial logistic regression, 426

multinomial model, 69, 79

for missing data, 462

multinomial probit model, 432

multinomial regression, 408, 423–428

parameterization as a Poisson regression, 427

multiparameter models, 63–82

multiple comparisons, 96, 134, 150, 186

multiple imputation, 201, 451–454

combining inferences, 453

pre-election polling, 456–462

Slovenia survey, 463–466

multiple modes, 311, 321

multivariate models

for nonnormal data, 423–425

hierarchical, 390–392

prior distributions

noninformative, 458

multivariate normal distribution, 576, 580

multivariate t distribution, 319, 578

natural parameter for an exponential family, 36

negative binomial distribution, 44, 132, 578, 584

as overdispersed alternative to Poisson, 437, 446

nested Dirichlet process (NDP), 566–568

neural networks, 485

New York population, 188–191

Newcomb’s speed of light experiment, 66, 79

Newton’s method for optimization, 312

no interference between units, 200

no U-turn sampler for Hamiltonian Monte-Carlo, 304

non-Bayesian methods, 92–97, 100

difficulties for SAT coaching experiments, 119

nonconjugate prior distributions, see prior distribution

nonidentified parameters, 89

nonignorable and known designs, 204

nonignorable and unknown designs, 204

noninformative prior distribution, 51–57

binomial model, 37, 53

difficulties, 54

for hyperparameters, 108, 110, 111, 115, 117, 526

in Stan, 594

Jeffreys’ rule, 52–53, 57, 59

multivariate normal model, 73

normal model, 64

pivotal quantity, 54, 57

nonlinear models, 471–486

Gaussian processes, 501–517

golf putting, 486, 499

mixture of exponentials, 486

serial dilution assay, 471–476

splines, 487–499

toxicology, 477–485

nonparametric methods, 96

nonparametric models, 501–517, 545–574

nonparametric regression, 487–499

nonrandomized studies, 220

normal approximation, 83–87, 318–319

bioassay experiment, 86

for generalized linear models, 409

lower-dimensional, 85

meta-analysis example, 125

multimodal, 319

normal distribution, 575, 576

normal model, 39, 41, 60, 64–69, see also linear regression and hierarchical models

multivariate, 70, 454–462

power-transformed, 188–191, 194–195

normalizing factors, 7, 345–349

notation for data collection, 199

notation for observed and missing data, 199, 449, 452

nuisance parameters, 63

numerical integration, 271, 318–319, 345–348

Laplace’s method, 318, 348

numerical posterior predictive checks, 143–152

NYPD stops example, 420–422

objective assignment of probability distributions

football example, 13–16

record linkage example, 16–19

objectivity of Bayesian inference, 13, 24

observational studies, 220–224

difficulties with, 222

distinguished from experiments, 220

incumbency example, 358–364

observed at random, 450

observed data, see missing data

observed information, 84

odds ratio, 8, 80, 125

offsets for generalized linear models, 407

chess example, 428

police example, 420

optimization and the Metropolis algorithm, 279

ordered logit and probit models, 408, 426

outcome variable, 353

outliers, models for, 435

output analysis for iterative simulation, 281–288

overdispersed models, 407, 431, 433, 437–439

overfitting, 101, 367, 409

p-values, see also model checking

Bayesian (posterior predictive), 146

classical, 98, 145

interpretation of, 150

packages, see software

paired comparisons with ties, 432

multinomial model for, 427

parallel tempering for MCMC, 299–300

parameter expansion

election forecasting example, 393

for Anova computation, 396

for EM algorithm, 325, 348

for hierarchical regression, 393, 396

programming in R, 600–601

parameters, 4

different from predictions, in frequentist inference, 94, 401

Pareto distribution, 493

partial pooling, see shrinkage

partially conjugate prior distribution, 115, 322

path sampling, 347–348

perchloroethylene, 477

perfect simulation for MCMC, 309

permutation tests, 96

personal (subjective) probability, 13, 256

pharmacokinetics, 480–481

philosophy, references to discussions of, 26

pivotal quantity, 54, 57, 66, 151

point estimation, 85, 91, 99

Poisson distribution, 578, 583

Poisson model, 43, 59, 61

parameterized in terms of rate and exposure, 45

Poisson regression, 82, 406, 433

for multinomial data, 426

hierarchical, 420–422

police stops, example of hierarchical Poisson regression, 420–422

Polya urn model, 549

pooling, partial, 25, 115

population distribution, 101

posterior distribution, 3, 7, 10

as compromise, 32, 40, 58

improper, 54, 90, 135

joint, 63

marginal, 63

normal approximation, see normal approximation

predictive, 7

summaries of, 32

use as prior distribution when new data arrive, 9, 251

posterior intervals, 3, 33, 267

posterior modes, 311–330

approximate conditional posterior density using marginal modes, 325

conditional maximization (stepwise ascent), 312

EM algorithm for marginal posterior modes, 320–325, 348

ECM and ECME algorithms, 323, 456, 526

examples, 322, 329, 444, 465, 526

generalized EM algorithm, 321

marginal posterior density increases at each step, 329

missing data, 452, 454

SEM algorithm, 465

SEM and SECM algorithms, 324–325

joint mode, problems with, 350

Newton’s method, 312

posterior predictive checks, 143–161, see also model checking

graphical, 153–159

numerical, 143–152

posterior predictive distribution, 7

hierarchical models, 108, 118

linear regression, 357

missing data, 202

mixture model, 530

multivariate normal model, 72

normal model, 66

speed of light example, 144

posterior simulation, 22–24, 267–310, see also Markov chain Monte Carlo (MCMC)

computation in R and Stan, 589–606

direct, 263–264

grid approximation, 76–77, 263

hierarchical models, 112

how many draws are needed, 267, 268, 272

rejection sampling, 264

simple problems, 78

two-dimensional, 76, 82

using inverse cdf, 23

poststratification, 222, 422–423, 460

potential scale reduction factor, 285

power transformations, 188–191, 194–195

pre-election polling, 69, 79, 233–234, 422–423

in Slovenia, 463–466

missing data, 456–466

state-level opinions from national polls, 422–423

stratified sampling, 207–210

precision (inverse of variance), 40

prediction, see posterior predictive distribution

predictive simulation, 28

predictor variables, see regression models, explanatory variables

predictors

including even if not ‘statistically significant’, 241–244

selecting, 186

principal stratification, 223–224

prior distribution, 6, 10

boundary-avoiding, 313–318

conditionally conjugate, 129, 130, 280, 315, 332, 503, 553

conjugate, 35–37, 56

binomial model, 34–35, 38

exponential model, 46

generalized linear models, 409

linear regression, 376–378

multinomial model, 69, 429, 462

multivariate normal model, 71, 72

normal model, 40, 43, 67

Poisson model, 44

estimation from past data, 102

for covariance matrices

noninformative, 458

hierarchical, see hierarchical models and hyperprior distribution

improper, 52, 82

and Bayes factors, 194

informative, 34–46, 480–481

nonconjugate, 36, 38, 75

noninformative, 51–57, 93

t model, 443

binomial model, 37, 53

difficulties, 54

for hyperparameters, 108, 110, 111, 115, 117, 526

generalized linear models, 409

in Stan, 590, 594

Jeffreys’ rule, 52–53, 57, 59

linear regression, 355

multinomial model, 464

multivariate normal model, 73

normal model, 64

pivotal quantity, 54, 57

warnings, see posterior distribution, improper

partially conjugate, 115, 322

predictive, 7

proper, 52

weakly informative, 55–57, 128–132, 313–318

in Stan, 594

prior predictive checks, 162, 164

prior predictive distribution, 7

normal model, 41

probability, 19–22, 26

assignment, 13–19, 26, 27

foundations, 11–13, 25

notation, 6

probability model, 3

probit regression, 406

for multinomial data, 426, 432

Gibbs sampler, 408

latent-data interpretation, 408

probit transformation, 22

programming tips, 270–271, 605–606

propensity scores, 204, 221, 222, 230

proper prior distribution, see prior distribution

proportion of female births, 29, 37–39

psychological data, 154–157, 524–533

PX-EM algorithm, 325, 348, see also parameter expansion

QR decomposition, 356, 378

quality-adjusted life expectancy, 245

quasi-Newton optimization, 313

R, see software

for monitoring convergence of iterative simulation, 285

radial basis functions, 487

radon decision problem, 194, 246–256, 378

random probability measure (RPM), 545–574

random-effects model, 382–388

analysis of variance (Anova), 395

and superpopulation model in Anova, 397

election forecasting example, 386

non-nested example, 422–423

several batches, 383

randomization, 218–220

and ignorability, 220, 230

complete, 218

given covariates, 219

randomized blocks, 231

rank test, 97

rat tumors, 102–103, 109–113, 133

ratio estimation, 93, 98

record linkage, 16–19

record-breaking data, 230

reference prior distributions, see noninformative prior distribution

regeneration for MCMC, 309

regression models, 353–380, see also linear regression

Bayesian justification, 354

explanatory variables, 5, 200, 353, 365–367

exchangeability, 5

exclude when irrelevant, 367

ignorable models, 203

goals of, 364–365

hierarchical, 381-404

variable selection, 367

why we prefer to use informative prior distributions, 367–369

regression to the mean, 95

regression trees, 485

regularization, 52, 113–124, 368–369, 493

rejection sampling, 264, 273

picture of, 264

replications, 145

residual plots, 162, 358

binned, 157–158

dilution example, 476

incumbency example, 362

nonlinear models, 476, 484

pain relief example, 158

toxicology example, 484

residuals, 157

response surface, 126

response variable, 353

reversible jump sampling for MCMC, 297–299, 309

ridge regression, 401

robit regression (robust alternative to logit and probit), 438

robust inference, 162, 185, 192, 435–447

for regression, 444–445

SAT coaching, 441–444

various estimands, 191

rounded data, 80, 234

sampling, 205–214, see also surveys

capture-recapture, 233

cluster, 210–212, 232

poststratification, 222, 422–423

ratio estimation, 93, 98

stratified, 206–210

unequal selection probabilities, 212–214, 233–234

sampling distribution, 6, 35

SAT coaching experiments, 119–124

difficulties with natural non-Bayesian methods, 119

information criteria and effective number of parameters, 179

model checking for, 159–161

robust inference for, 441–444

scale parameter, 43

scaled inverse-2 distribution, 43, 576, 581

scaled inverse-Wishart model, 74, 390

schizophrenia reaction times, example of mixture modeling, 524–533

selection of predictors, 186

SEM and SECM algorithms, 324–325, 348

sensitivity analysis, 160–161, 184, 185, 435–447

and data collection, 191

and realistic models, 191

balanced and unbalanced data, 221

cannot be avoided by setting up a super-model, 141

estimating a population total, 188–191

incumbency example, 363

SAT coaching, 441–444

using t models, 443–444

various estimands, 191

sequential designs, 217, 235

serial dilution assay, example of a nonlinear model, 471–476, 485

sex ratio, 29, 37–39

shrinkage, 32, 40, 45, 113–124, 132, 368–369, 490, 493

graphs of, 113, 122

simple random sampling, 205–206

difficulties of estimating a population total, 188

simulated tempering for MCMC, 309

simulation, see posterior simulation

single-parameter models, 29–62

SIR, see importance resampling

slice sampling for MCMC, 297, 309

Slovenia survey, 463–466

small-area estimation, 133

software, 589–606

Bugs, 27, 133, 269, 272

debugging, 270–271, 605–606

extended example using Stan and R, 589–605

programming tips, 270–271, 605–606

R, 22, 27, 589–606

R programming, 594–606

running Stan from R, 589

setting up, 589

Stan, 22, 269, 307–308, 589–594

speed of light example, 66, 143

posterior predictive checks, 146

spelling correction, simple example of Bayesian inference, 9–11

splines, 487–499

gay marriage, 499

golf putting, 499

multivariate, 495–498

sports

football, 13–16, 26

golf, 486, 499, 517

stability, 200

stable estimation, 91

stable unit treatment value assumption, 200, 231

Stan, 307–308, 589–594

standard errors, 85

state-level opinions from national polls, 422–423

statistical packages, see software

statistically significant but not practically significant, 151

regression example, 363

stepwise ascent, 312

stepwise regression, Bayesian interpretation of, 367

stratified sampling, 206–210

hierarchical model, 209–210, 292

pre-election polling, 207–210

strong ignorability, 203

Student-t model, see t model

subjectivity, 12, 13, 26, 28, 100, 248, 256

sufficient statistics, 36, 93, 338

summary statistics, 85

superpopulation inference, 200–203, 205–206, 208, 209, 212, 214–216, 232

in Anova, 396–397

supplemented EM (SEM) algorithm, 324–325

survey incentives, example of meta-analysis and decision analysis, 239–244

surveys, 205–214, 454–466, see also sampling

adolescent smoking, 148–150

Alcoholics Anonymous, 213–214

incentives to increase response rates, 239–244

pre-election polling, 207–210, 422–423, 456–466

telephone, unequal sampling probabilities, 233–234

t approximation, 319

t distribution, 66, 578, 582

t model, 437, 441–445

computation using data augmentation, 293–294

computation using parameter expansion, 295

interpretation as mixture, 437

tail-area probabilities, see p-values

target distribution, 261

test statistics and test quantities, 145

choosing, 147

examples, see model checking

graphical, 153–159

numerical, 143–152

thinning of MCMC sequences, 282

three steps of Bayesian data analysis, 3

tilted distribution in expectation propagation, 339

toxicology model, as example of an ill-posed system, 477–485

trans-dimensional MCMC, 297–299, 309

transformations, 21, 99

examples where not needed, 241, 360

logarithmic, 380

logistic (logit, log-odds), 22, 125

power, 188–191, 194–195

probit, 22

rat tumor example, 110

to improve MCMC efficiency, 293–295

to reduce correlations in hierarchical models, 480

useful in setting up a multivariate model, 424

treatment variable, 353

truncated data, 224–228

2 × 2 tables, 80, 125, 423–425

type I errors, why we do not care about, 150

U.S. House of Representatives, 358

unbiasedness, see bias

unbounded likelihoods, 90

underidentified models, 89

uniform distribution, 575, 576

units, 353

unnormalized densities, 7, 261

unseen species, estimating the number of, 349

utility in decision analysis, 238, 245, 248, 256

variable selection, why we prefer to use informative prior distributions, 367

variance matrix, see covariance matrix

variational inference, 331–338

EM as special case, 337

hierarchical model example, 332–335

model checking for, 336

picture of, 334, 335, 342

variational lower bound, 336

varying intercepts and slopes, 390–392

vector and matrix notation, 4

warm-up for MCMC sequences, 282

Watanabe-Akaike or widely available information criterion (WAIC), 173–174, 177

discussion, 182

educational testing example, 179

weakly informative prior distribution, 55–57, 128–132, 313–318

in Stan, 594

Weibull distribution, 576, 581

Wilcoxon rank test, 97

Wishart distribution, 576, 582

yrep, 145

, 4, 7, 145

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset