Appendix A

Standard probability distributions

Tables A.1 and A.2 present notation, probability density functions, parameter descriptions, means, modes, and standard deviations for several standard probability distributions. We use the standard notation θ for the random variable (or random vector), except in the case of the Wishart and inverse-Wishart, for which we use W for the random matrix, and LKJ correlation, for which we use σ for a correlation matrix.

Realistic distributions for complicated multivariate models, including hierarchical and mixture models, can typically be constructed using, as building blocks, the simple distributions listed here. In our own work we use preprogrammed random number routines (many available in R, for example), but it can be valuable to understand where these numbers come from.

The starting point for any simulations are functions that draw pseudorandom samples from the uniform distribution on the unit interval. Much research has been done to ensure that the pseudorandom numbers are appropriate for realistic applied tasks. For example, a sequence may appear uniform in one dimension while m-tuples are not randomly scattered in m dimensions.

A.1   Continuous distributions

Uniform

The uniform distribution is used to represent a variable that is known to lie in an interval and equally likely to be found anywhere in the interval. A noninformative distribution is obtained in the limit as a → -∞, b → ∞. If u is drawn from a standard uniform distribution U(0, 1), then θ = a + (b - a)u is a draw from U(a, b).

Univariate normal

The normal, or Gaussian, distribution is ubiquitous in statistics. Sample averages are approximately normally distributed by the central limit theorem. A noninformative or flat distribution is obtained in the limit as the variance σ → ∞. The variance is usually restricted to be positive; σ = 0 corresponds to a point mass at θ. The density function is always finite, the integral is finite as long as σ is finite. If z is a random deviate from the standard normal distribution, then θ = μ + σz is a draw from N(μ, σ2).

Two properties of the normal distribution that play a large role in model building and Bayesian computation are the addition and mixture properties.

The sum of two independent normal random variables is normally distributed. If θ1 and θ2 are independent with N(μ1, σ12) and N(μ, σ22) distributions, then . The mixture property states that if . This is useful in the analysis of hierarchical normal models.

Table A.1 Continuous distributions

Table A.2 Discrete distributions

Lognormal

If θ is a random variable that is restricted to be positive, and log θ ~ N(μ, σ2), then θ is said to have a lognormal distribution. Using the Jacobian of the log transformation, one can directly determine that the density is , the variance is exp(2μ) exp(σ2)(exp(σ2) - 1), and the mode is exp(μ - σ2). The geometric mean and geometric standard deviation of a lognormally distributed random variable θ are simply eμ and eσ.

Multivariate normal

The multivariate normal density is always finite; the integral is finite as long as det(σ-1) > 0. A noninformative distribution is obtained in the limit as det(σ-1) → 0; this limit is not uniquely defined. A random draw from a multivariate normal distribution can be obtained using the Cholesky decomposition of σ and a vector of univariate normal draws. The Cholesky decomposition of σ produces a lower-triangular matrix independent standard normal random variables, then is a random draw from the multivariate normal distribution with covariance matrix σ.

The marginal distribution of any subset of components (for example, θi or (θi, θj)) is also normal. Any linear transformation of θ, such as the projection of θ onto a linear subspace, is also normal, with dimension equal to the rank of the transformation. The conditional distribution of θ, constrained to lie on any linear subspace, is also normal. The addition property holds: if θ1 and θ2 are independent with N(μ1, σ1) and N(μ2, σ2) distributions, then as long as θ1 and θ2 have the same dimension. We discuss the generalization of the mixture property shortly.

The conditional distribution of any subvector of θ given the remaining elements is once again multivariate normal. If we partition θ into subvectors is (multivariate) normal:

where cov(V, U) is a rectangular matrix (submatrix of σ) of the appropriate dimensions, and cav(U, V) = cov(V, U)T. In particular, if we define the matrix of conditional coefficients,

then

Conversely, if we parameterize the distribution of U and V hierarchically:

then the joint distribution of θ is the multivariate normal,

This generalizes the mixture property of univariate normals.

The ‘weighted sum of squares,’ , has a distribution. For any matrix A for which AAT = σ, the conditional distribution of -dimensional sphere.

Gamma

The gamma distribution is the conjugate prior distribution for the inverse of the normal variance and for the mean parameter of the Poisson distribution. The gamma integral is finite if α > 0; the density function is finite if α > 1. Many computer packages generate gamma random variables directly; otherwise, it is possible to obtain draws from a gamma random variable using draws from a uniform as input. The most effective method depends on the parameter α; see the references for details.

There is an addition property for independent gamma random variables with the same inverse scale parameter. If θ1 and θ2 are independent with Gamma(α1, β) and Gamma(α2, β) distributions, then θ1 + θ2 ~ Gamma(α1 + α2, β). The logarithm of a gamma random variable is approximately normal; raising a gamma random variable to the one-third power provides an even better normal approximation.

Inverse-gamma

If θ-1 has a gamma distribution with parameters α, β, then θ has the inverse-gamma distribution. The density is finite always; its integral is finite if α > 0. The inverse-gamma is the conjugate prior distribution for the normal variance. A noninformative distribution is obtained as α, β → 0.

Chi-square

The -2 distribution is a special case of the gamma distribution, with α = ν/2 and β = 1/2. The addition property holds since the inverse scale parameter is fixed: if θ1 and θ2 are independent with and distributions, then .

Inverse chi-square

The inverse-2 is a special case of the inverse-gamma distribution, with α = ν/2 and β = 1/2. We also define the scaled inverse chi-square distribution, which is useful for variance parameters in normal models. To obtain a simulation draw θ from the Inv-2(ν, s2) distribution, first draw X from the -ν2 distribution and then let .

Exponential

The exponential distribution is the distribution of waiting times for the next event in a Poisson process and is a special case of the gamma distribution with α = 1. Simulation of draws from the exponential distribution is straightforward. If U is a draw from the uniform distribution on [0, 1], then -log(U)/β is a draw from the exponential distribution with parameter β.

Weibull

If θ is a random variable that is restricted to be positive, and (θ/β)α has an Expon(1) distribution, then θ is said to have a Weibull distribution with shape parameter α > 0 and scale parameter β > 0. The Weibull is often used to model failure times in reliability analysis. Using the Jacobian of the log transformation, one can directly determine that the density is , the mean is , and the mode is β(1 - 1/α)1/α.

Wishart

The Wishart is the conjugate prior distribution for the inverse covariance matrix in a multivariate normal distribution. It is a multivariate generalization of the gamma distribution. The integral is finite if the degrees of freedom parameter, ν, is greater than or equal to the dimension, k. The density is finite if ν ≥ k + 1. A noninformative distribution is obtained as ν → 0. The sample covariance matrix for independent and identically distributed multivariate normal data has a Wishart distribution. In fact, multivariate normal simulations can be used to simulate a draw from the Wishart distribution, as follows. Simulate α1, …, αν, ν independent samples from a k-dimensional multivariate N(0, S) distribution, then let θ = σi=1ν αiαiT. This only works when the distribution is proper; that is, ν ≥ k.

Inverse-Wishart

If W-1 ~ Wishartν(S) then W has the inverse-Wishart distribution. The inverse-Wishart is the conjugate prior distribution for the multivariate normal covariance matrix. The inverse-Wishart density is always finite, and the integral is always finite. A degenerate form occurs when ν < k.

LKJ correlation

The LKJ distribution (Lewandowski, Kurowicka, and Joe, 2009) is a distribution over positive-definite symmetric matrices with unit diagonals—that is, correlation matrices. If σ is a correlation matrix, LkjCorr(ση) ∝ det(σ)η-1, for with the parameter η required to be positive. The shape parameter η can be interpreted like the shape parameter of a symmetric beta distribution. If η = 1, then the density is uniform over all correlation matrices of a given order. If η > 1, the modal correlation matrix is the identity, with the distribution being more concentrated about this mode as η becomes large. For 0 < η < 1, the density has a trough at the identity matrix.

t

The t (or Student-t) is the marginal posterior for the normal mean with unknown variance and conjugate prior and can be interpreted as a mixture of normals with common mean and variances that follow an inverse-gamma distribution. The t is also the ratio of a normal random variable and the square root of an independent gamma random variable. To simulate t, simulate z from a standard normal and x from a - density is always finite; the integral is finite if ν > 0 and σ is finite. In the limit ν → ∞, the t distribution approaches N(ν, σ2). The case of ν = 1 is called the Cauchy distribution. The t distribution can be used in place of a normal in a robust analysis.

To draw from the multivariate distribution, generate a vector z ~ N(0, I) and a scalar , then compute , where A satisfies AAT = σ.

Beta

The beta is the conjugate prior distribution for the binomial probability. The density is finite if α, β ≥ 1, and the integral is finite if α, β > 0. The choice α = β = 1 gives the standard uniform distribution; α = β = 0.5 and α = β = 0 are also sometimes used as noninformative densities. To simulate θ from the beta distribution, first simulate xα and xβ from -2 and -2β2 distributions, respectively, then let .

It is sometimes useful to estimate quickly the parameters of the beta distribution using the method of moments:

The kth order statistic from a sample of n independent U(0, 1) variables has the Beta(k, n -k + 1) distribution.

Dirichlet

The Dirichlet is the conjugate prior distribution for the parameters of the multinomial distribution. The Dirichlet is a multivariate generalization of the beta distribution. As with the beta, the integral is finite if all of the α’s are positive, and the density is finite if all are greater than or equal to one.

The marginal distribution of a single θj is Beta(αj, α0 - αj). The marginal distribution of a subvector of θ is Dirichlet; for example (θi, θj, 1-θi-θj) ~ Dirichlet(αi, αj, α0 - αi - αj). The conditional distribution of a subvector given the remaining elements is Dirichlet under the condition .

There are two standard approaches to sampling from a Dirichlet distribution. The fastest method generalizes the method used to sample from the beta distribution: draw x1, …, xk from independent gamma distributions with common scale and shape parameters α1, …, αk, and for each . A less efficient algorithm relies on the univariate marginal and conditional distributions being beta and proceeds as follows. Simulate θ1 from a Beta(α1, σi=2k αi) distribution. Then simulate θ2, …, θk-1 in order, as follows. For distribution, and let . Finally, set .

Constrained distributions

We sometimes use notation such as N+ to convey the normal distribution constrained to be positive; that is, the truncated normal distribution. We also have occasion to use the half-t distribution, which is the right half of the t distribution.

A.2   Discrete distributions

Poisson

The Poisson distribution is commonly used to represent count data, such as the number of arrivals in a fixed time period. The Poisson distribution has an addition property: if θ1 and θ2 are independent with Poisson(λ1) and Poisson(λ2) distributions, then θ1 + θ2 ~ Poisson(λ1 + λ2). Simulation for the Poisson distribution (and most discrete distributions) can be cumbersome. Table lookup can be used to invert the cumulative distribution function. Simulation texts describe other approaches.

Binomial

The binomial distribution is commonly used to represent the number of ‘successes’ in a sequence of n independent and identically distributed Bernoulli trials, with probability of success p in each trial. A binomial random variable with large n is approximately normal. If θ1 and θ2 are independent with Bin(n1, p) and Bin(n2, p) distributions, then θ1 + θ2 ~ Bin(n1 + n2, p). For small n, a binomial random variable can be simulated by obtaining n independent standard uniforms and setting θ equal to the number of uniform deviates less than or equal to p. For larger n, more efficient algorithms are often available in computer packages. When n = 1, the binomial is called the Bernoulli distribution.

Multinomial

The multinomial distribution is a multivariate generalization of the binomial distribution. The marginal distribution of a single θi is binomial. The conditional distribution of a subvector of θ is multinomial with ‘sample size’ parameter reduced by the fixed components of θ and ‘probability’ parameters rescaled to have sum equal to one. We can simulate a multivariate draw using a sequence of binomial draws. Draw θ1 from a Bin(n, p1) distribution. Then draw θ2, …, θk-1 in order, as follows. For j = 2, …, k - 1, draw θj from a Bin distribution. Finally, set . If at any time in the simulation the binomial sample size parameter equals zero, use the convention that a Bin(0, p) variable is identically zero.

Negative binomial

The negative binomial distribution is the marginal distribution for a Poisson random variable when the rate parameter has a Gamma(α, β) prior distribution. The negative binomial can also be used as a robust alternative to the Poisson distribution, because it has the same sample space, but has an additional parameter. To simulate a negative binomial random variable, draw λ ~ Gamma(α, β) and then draw θ ~ Poisson(λ). In the limit α → ∞, and α/β → constant, the distribution approaches a Poisson with parameter α/β. Under the alternative parameterization, , the random variable θ can be interpreted as the number of Bernoulli failures obtained before the α successes, where the probability of success is p.

Beta-binomial

The beta-binomial arises as the marginal distribution of a binomial random variable when the probability of success has a Beta(α, β) prior distribution. It can also be used as a robust alternative to the binomial distribution. The mixture definition gives an algorithm for simulating from the beta-binomial: draw ~ Beta(α, β) and then draw θ ~ Bin(n, ).

A.3   Bibliographic note

Many software packages contain subroutines to simulate draws from these distributions. Texts on simulation typically include information about many of these distributions; for example, Gentle (2003) discusses simulation of all of these in detail, except for the LKJ distribution. Ripley (1987) is another helpful general book on simulation. Johnson and Kotz (1972) give more detail, such as the characteristic functions, for the distributions. Fortran and C programs for uniform, normal, gamma, Poisson, and binomial distributions are available in Press et al. (1986).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset