Statistical Distributions



This chapter presents a systematic discussion of families of distribution functions, which are widely used in statistical modeling. We discuss univariate and multivariate distributions. A good part of the chapter is devoted to the distributions of sample statistics.


2.2.1 Binomial Distributions

Binomial distributions correspond to random variables that count the number of successes among N independent trials having the same probability of success. Such trials are called Bernoulli trials. The probabilistic model of Bernoulli trials is applicable in many situations, where it is reasonable to assume independence and constant success probability.

Binomial distributions have two parameters N (number of trials) and θ (success probability), where N is a positive integer and 0 < θ < 1. The probability distribution function is denoted by b(i; N, θ) and is

(2.2.1) numbered Display Equation

The c.d.f. is designated by B(i; N, θ), and is equal to B(i; N, θ) = inline.jpg

The Binomial distribution formula can also be expressed in terms of the incomplete beta function by

(2.2.2) numbered Display Equation


(2.2.3) numbered Display Equation

The parameters p and q are positive, i.e., 0 < p, q < ∞; inline.jpg is the (complete) beta function. Or

(2.2.4) numbered Display Equation

The quantiles B−1(p; N, θ), 0 < p < 1, can be easily determined by finding the smallest value of i at which B(i; N, θ) ≥ p.

2.2.2 Hypergeometric Distributions

The hypergeometric distributions are applicable when we sample at random without replacement from a finite population (collection) of N units, so that every possible sample of size n has equal selection probability, inline.jpg. If X denotes the number of units in the sample having a certain attribute, and if M is the number of units in the population (before sampling) having the same attribute, then the distribution of X is hypergeometric with the probability density function (p.d.f.)

(2.2.5) numbered Display Equation

The c.d.f. of the hypergeometric distribution will be denoted by H(i; N, M, n). When n/N is sufficiently small (smaller than 0.1 for most practical applications), we can approximate H(i; N, M, n) by B(i; n, M/N). Better approximations (Johnson and Kotz, 1969, p. 148) are available, as well, as bounds on the error terms.

2.2.3 Poisson Distributions

Poisson distributions are applied when the random variables under consideration count the number of events occurring in a specified time period, or on a spatial area, and the observed processes satisfy the basic conditions of time (or space) homogeneity, independent increments, and no memory of the past (Feller, 1966, p. 566). The Poisson distribution is prevalent in numerous applications of statistics to engineering reliability, traffic flow, queuing and inventory theories, computer design, ecology, etc.

A random variable X is said to have a Poisson distribution with intensity λ, 0 < λ < ∞, if it assumes only the nonnegative integers according to a probability distribution function

(2.2.6) numbered Display Equation

The c.d.f. of such a distribution is denoted by P(i; λ).

The Poisson distribution can be obtained from the Binomial distribution by letting N → ∞, θ → 0 so that N θ → λ, where 0 < λ < ∞ (Feller, 1966, p. 153, or Problem 5 of Section 1.10). For this reason, the Poisson distribution can provide a good model in cases of counting events that occur very rarely (the number of cases of a rare disease per 100, 000 in the population; the number of misprints per page in a book, etc.).

The Poisson c.d.f. can be determined from the incomplete gamma function according to the following formula

(2.2.7) numbered Display Equation

for all k = 0, 1, …, where

(2.2.8) numbered Display Equation

is the gamma function.

2.2.4 Geometric, Pascal, and Negative Binomial Distributions

The geometric distribution is the distribution of the number of Bernoulli trials until the first success. This distribution has therefore many applications (the number of shots at a target until the first hit). The probability distribution function of a geometric random variable is

(2.2.9) numbered Display Equation

where θ, 0 < θ < 1, is the probability of success.

If the random variable counts the number of Bernoulli trials until the ν–th success, ν = 1, 2, …, we obtain the Pascal distribution with p.d.f.

(2.2.10) numbered Display Equation

The geometric distributions constitute a subfamily with ν = 1. Another family of distributions of this type is that of the Negative–Binomial distributions. We designate by NB(ψ, ν), 0 < ψ < 1, 0 < ν < ∞, a random variable having a Negative–Binomial distribution if its p.d.f. is

(2.2.11) numbered Display Equation

Notice that if X has the Pascal distribution with parameters ν and θ, then Xν is distributed like NB(1 – θ, ν). The probability distribution of Negative–Binomial random variables assigns positive probabilities to all the nonnegative integers. It can therefore be applied as a model in cases of counting random variables where the Poisson assumptions are invalid. Moreover, as we show later, Negative–Binomial distributions may be obtained as averages of Poisson distributions. The family of Negative–Binomial distributions depend on two parameters and can therefore be fitted to a variety of empirical distributions better than the Poisson distributions. Examples of this nature can be found in logistics research in studies of population growth with immigration, etc.

The c.d.f. of the NB(ψ, ν), to be designated as NB(i; ψ, ν), can be determined by the incomplete beta function according to the formula

(2.2.12) numbered Display Equation

A proof of this useful relationship is given in Example 2.3.


2.3.1 Rectangular Distributions

A random variable X has a rectangular distribution over the interval (θ1, θ2), -∞ < θ1 < θ2 < ∞, if its p.d.f. is

(2.3.1) numbered Display Equation

The family of all rectangular distributions is a two–parameter family. We denote r.v.s having these distributions by R(θ1, θ2); -∞ < θ1 < θ2 < ∞. We note that if X is distributed as R(θ1, θ2), then X is equivalent to θ1 + (θ2θ1) U, where UR(0, 1). This can be easily verified by considering the distribution functions of R(θ1, θ2) and of R(0, 1), respectively. Accordingly, the parameter α = θ1 can be considered a location parameter and β = θ2θ1 is a scale parameter. Let fU(x) = I{0 ≤ x ≤ 1} be the p.d.f. of the standard rectangular r.v. U. Thus, we can express the p.d.f. of R(θ1, θ2) by the general presentation of p.d.f.s in the location and scale parameter models; namely

(2.3.2) numbered Display Equation

The standard rectangular distribution function occupies an important place in the theory of statistics. One of the reasons is that if a random variable has an arbitrary continuous distribution function F(x), then the transformed random variable Y = F(X) is distributed as U. For each ξ, 0 < ξ < 1, let

(2.3.3) numbered Display Equation

Accordingly, since F(x) is nondecreasing and continuous,

(2.3.4) numbered Display Equation

The transformation XF(X) is called the Cumulative Probability Integral Transformation.

Notice that the pth quantile of R(θ1, θ2)is

(2.3.5) numbered Display Equation

The following has application in the theory of testing hypotheses.

If X has a discrete distribution F(x) and if we define the function

(2.3.6) numbered Display Equation

where -∞ < x < ∞ and 0 ≤ γ ≤ 1, then H(X, U) has a rectangular distribution as R(0, 1), where U is also distributed like R(0, 1), independently of X. We notice that if x is a jump point of F(x), then H(x, γ) assumes a value in the interval [F(x – 0), F(x)]. On the other hand, if x is not a jump point, then H(x, γ) = F(x) for all γ. Thus, for every p, 0 ≤ p ≤ 1,

Unnumbered Display Equation


(2.3.7) numbered Display Equation

Accordingly, for every p, 0 ≤ p ≤ 1,

(2.3.8) numbered Display Equation

2.3.2 Beta Distributions

The family of Beta distributions is a two–parameter family of continuous distributions concentrated over the interval [0, 1]. We denote these distributions by β (p, q); 0 < p, q < ∞. The p.d.f. of a β (p, q) distribution is

(2.3.9) numbered Display Equation

The R(0, 1) distribution is a special case. The distribution function (c.d.f.) of β (p, q) coincides over the interval (0, 1) with the incomplete Beta function (2.3.2). Notice that

(2.3.10) numbered Display Equation

Hence, the Beta distribution is symmetric about x = .5 if and only if p = q.

2.3.3 Gamma Distributions

The Gamma function Γ (p) was defined in (2.2.8). On the basis of this function we define a two–parameter family of distribution functions. We say that a random variable X has a Gamma distribution with positive parameters λ and p, to be denoted by G(λ, p), if its p.d.f. is

(2.3.11) numbered Display Equation

λ−1 is a scale parameter, and p is called a shape parameter. A special important case is that of p = 1. In this case, the density reduces to

(2.3.12) numbered Display Equation

This distribution is called the (negative) exponential distribution. Exponentially distributed r.v.s with parameter λ are denoted also as E(λ).

The following relationship between Gamma distributions explains the role of the scale parameter λ−1

(2.3.13) numbered Display Equation

Indeed, from the definition of the gamma p.d.f. the following relationship holds for all ξ, 0 ≤ ξ ≤ ∞,

(2.3.14) numbered Display Equation

In the case of λ = inline.jpg and p = ν/2, ν = 1, 2, … the Gamma distribution is also called chi–squared distribution with ν degrees of freedom. The chi–squared random variables are denoted by χ2[ν], i.e.,

(2.3.15) numbered Display Equation

The reason for designating a special name for this subfamily of Gamma distributions will be explained later.

2.3.4 Weibull and Extreme Value Distributions

The family of Weibull distributions has been extensively applied to the theory of systems reliability as a model for lifetime distributions (Zacks, 1992). It is also used in the theory of survival distributions with biological applications (Gross and Clark, 1975). We say that a random variable X has a Weibull distribution with parameters (λ, α, ξ); 0 < λ, 0 < α < ∞; -∞ < ξ < ∞, if (Xξ)αG(λ, 1). Accordingly, (Xξ)α has an exponential distribution with a scale parameter λ−1, ξ is a location parameter, i.e., the p.d.f. assumes positive values only for xξ. We will assume here, without loss of generality, that ξ = 0. The parameter α is called the shape parameter. The p.d.f. of X, for ξ = 0 is

(2.3.16) numbered Display Equation

and its c.d.f. is

(2.3.17) numbered Display Equation

The extreme value distribution (of Type I) is obtained from the Weibull distribution if we consider the distribution of Y = -log X, where XαG(λ, 1). Accordingly, the c.d.f. of Y is

(2.3.18) numbered Display Equation

-∞ < η < ∞, and its p.d.f. is

(2.3.19) numbered Display Equation

- ∞ < x < ∞.

Extreme value distributions have been applied in problems of testing strength of materials, maximal water flow in rivers, biomedical problems, etc. (Gumbel, 1958).

2.3.5 Normal Distributions

The normal distribution occupies a central role in statistical theory. Many of the statistical tests and estimation procedures are based on statistics that have distributions approximately normal in a large sample.

The family of normal distributions, to be designated by N(ξ, σ2), depends on two parameters. A location parameter ξ, -∞ < ξ < ∞ and a scale parameter σ, 0 < σ < ∞. The p.d.f. of a normal distribution is

(2.3.20) numbered Display Equation

-∞ < x < ∞.

The normal distribution with ξ = 0 and σ = 1 is called the standard normal distribution. The standard normal p.d.f. is denoted by inline.jpg (x). Notice that N(ξ, σ2) ∼ ξ + σ N(0, 1). Indeed, since σ > 0,

(2.3.21) numbered Display Equation

According to (2.3.21), the c.d.f. of N(ξ, σ2) can be computed on the basis of the standard c.d.f. The standard c.d.f. is denoted by Φ(x). It is also called the standard normal integral. Efficient numerical techniques are available for the computation of Φ (x). The function and its derivatives are tabulated. Efficient numerical approximations and asymptotic expansions are given in Abramowitz and Stegun (1968, p. 925). The normal p.d.f. is symmetric about the location parameter ξ. From this symmetry, we deduce that

(2.3.22) numbered Display Equation

By a series expansion of et2/2 and direct integration, one can immediately derive the formula

(2.3.23) numbered Display Equation

The computation according to this formula is often inefficient. An excellent computing formula was given by Zelen and Severo (1968), namely

(2.3.24) numbered Display Equation

where t = (1 + px)-1, p = .2316419; b1 = .3193815; b2 = -.3565638; b3 = 1.7814779; b4 = -1.8212550; b5 = 1.3302744. The magnitude of the error term is |inline.jpg(x)| < 7.5 · 10-8.

2.3.6 Normal Approximations

The normal distribution can be used in certain cases to approximate well, the cumulative probabilities of other distribution functions. Such approximations are very useful when it becomes too difficult to compute the exact cumulative probabilities of the distributions under consideration. For example, suppose XB(100, .35) and we have to compute the probability of the event {X ≤ 88}. This requires the computation of the sum of 89 terms in

Unnumbered Display Equation

Usually, such a numerical problem requires the use of some numerical approximation and/or the use of a computer. However, the cumulative probability B(88 | 100, .35) can be easily approximated by the normal c.d.f. This approximation is based on the celebrated Central Limit Theorem, which was discussed in Section 1.12. Accordingly, if xB(n, θ) and n is sufficiently large (relative to θ) then, for 0 ≤ k1k2n,

(2.3.25) numbered Display Equation

The symbol inline.jpg designates a large sample approximation.

The maximal possible error in using this approximation is less than .14[nθ (1 – θ)]-1/2 (Johnson and Kotz, 1969, p. 64). The approximation turns out to be quite good, even if n is not very large, if θ is close to θ0 = .5. In Table 2.1, we compare the numerically exact c.d.f. values of the Binomial distribution B(k; n, θ) with n = 25 (relatively small) and θ = .25, .40, .50 to the approximation obtained from (2.3.25) with k = k2 and k1 = 0.

Table 2.1 Normal Approximation to the Binomial c.d.f. n = 25


Considerable research has been done to improve the Normal approximation to the Binomial c.d.f. Some of the main results and references are provided in Johnson and Kotz (1969, p. 64).

In a similar manner, the normal approximation can be applied to approximate the Hypergeometric c.d.f. (Johnson and Kotz, 1969, p. 148); the Poisson c.d.f. (Johnson and Kotz, 1969, p. 99) and the Negative–Binomial c.d.f. (Johnson and Kotz, 1969, p. 127).

The normal distribution can provide also good approximations to the G(λ, ν) distributions, when ν is sufficiently large, and to other continuous distributions. For a summary of approximating formulae and references see Johnson and Kotz (1969) and Zelen and Severo (1968). In Table 2.2 we summarize important characteristics of the above distribution functions.

Table 2.2 Expectations, Variances and Moment Generating Functions of Selected Distributions




2.4.1 One–to–One Transformations of Several Variables

Let X1, …, Xk be random variables of the continuous type with a joint p.d.f. f(x1, …, xk). Let yi = gi(xi, …, xk), i = 1, …, k, be one–to–one transformations, and let xi = ψi(y1, …, yk) i = 1, …, k, be the inverse transformations. Assume that inline.jpg are continuous for all i, j = 1, …, k at all points (y1, …, yk). The Jacobian of the transformation is

(2.4.1) numbered Display Equation

where det.(·) denotes the determinant of the matrix of partial derivatives. Then the joint p.d.f. of (Y1, …, Yk) is

(2.4.2) numbered Display Equation

2.4.2 Distribution of Sums

Let X1, X2 be absolutely continuous random variables with a joint p.d.f. f(x1, x2). Consider the one–to–one transformation Y1 = X1, Y2 = X1 + X2. It is easy to verify that J(y1, y2) = 1. Hence,

Unnumbered Display Equation

Integrating over the range of Y1 we obtain the marginal p.d.f. of Y2, which is the required p.d.f. of the sum. Thus, if g(y) denotes the p.d.f. of Y2

(2.4.3) numbered Display Equation

If X1 and X2 are independent, having marginal p.d.f.s f1(x) and f2(x), the p.d.f. of the sum g(y) is the convolution of f1(x) and f2(x), i.e.,

(2.4.4) numbered Display Equation

If X1 is discrete, the integral in (2.4.4) is replaced by a sum over the jump points of F1 (x). If there are more than two variables, the distribution of the sum can be found by a similar method.

2.4.3 Distribution of Ratios

Let X1, X2 be absolutely continuous with a joint p.d.f., f(x1, x2). We wish to derive the p.d.f. of R = X1/X2. In the general case, X2 can be positive or negative and therefore we separate between the two cases. Over the set -∞ < x1 < ∞, 0 < x2 < ∞ the transformation R = X1 /X2 and Y = X2 is one–to–one. It is also the case over the set -∞ < x1 < ∞, -∞ < x2 < 0. The Jacobian of the inverse transformation is J(y, r) = –y. Hence, the p.d.f. of R is

(2.4.5) numbered Display Equation

The result of Example 2.2 has important applications.

Let X1, X2, …, Xk be independent random variables having gamma distributions with equal λ, i.e., XiG(λ, νi), i = 1, …, k. Let T = inline.jpg and for i = 1, …, k – 1

Unnumbered Display Equation

The marginal distribution of Yi is β inline.jpg. The joint distribution of Y = (Y1, …, Yk-1) is called the Dirichlet distribution, inline.jpg(ν1, ν2, …, νk), whose joint p.d.f. is

(2.4.6) numbered Display Equation

for inline.jpg.

The p.d.f. of inline.jpg(ν1, …, νk) is a multivariate generalization of the beta distribution.

Let inline.jpg. One can immediately prove that for all i, i′ = 1, …, k – 1

(2.4.7) numbered Display Equation

and thus

(2.4.8) numbered Display Equation

Additional properties of the Dirichlet distributions are specified in the exercises.


A random sample is a set of n (n ≥ 1) independent and identically distributed (i.i.d.) random variables, having a common distribution F(x). We assume that F has all moments required in the following development. The rth moment of F, r ≥ 1, is μr.

The rth sample moment is

(2.5.1) numbered Display Equation

We immediately obtain that

(2.5.2) numbered Display Equation

since all Xi are identically distributed. Notice that due to independence, cov(Xi, Xj) = 0 for all ij. We present here a method for computing inline.jpg and inline.jpg for r ≠ r′. We consider expansions of the form inline.jpg, in terms of augmented symmetric functions and introduce the following notation

(2.5.3) numbered Display Equation

(2.5.4) numbered Display Equation

(2.5.5) numbered Display Equation

etc. The sum of powers in such an expression is called the weight of [ ]. Thus, the weight of [l1l2l3] is w = l1 + l2 + l3. In Table 2.3, we find expansions of (l1)α1 (l2)α2… in terms of multi–sums inline.jpg. For additional values of coefficients for such expansions, see David and Kendall (1955). For example, to expand inline.jpg the weight is w = 5, and according to Table 2.3, (3)(1)2 = [5] + 2[41] + [32] + [312].

Table 2.3 Augmented Symmetric Functions in Terms of Power–Series

Weight ( ) [ ]
2 (2) [2]
(1)2 [2] + [12]
3 (3) [3]
(2)(1) [3] + [21]
(1)3 [3] + 3[21] + [13]
4 (4) [4]
(3)(1) [4] + [31]
(2)2 [4] + [22]
(2)(1)2 [4] + 2[31] + [22] + [212]
(1)4 [4] + 4[31] + 3[22] + 6[212] + [14]
5 (5) [5]
(4)(1) [5] + [41]
(3)(2) [5] + [32]
(3)(1)2 [5] + 2[41] + [32] + [312]
(2)2(1) [5] + [41] + 2[32] + [221]
(2)(1)3 [5] + 3[41] + 4[32] + 3[312] + 3[221] + [213]
(1)5 [5] + 5[41] + 10[32] + 10[312] + 15[221] + 10[213] + [15]
6 (6) [6]
(5)(1) [6] + [51]
(4)(2) [6] + [42]
(4)(12) [6] + 2[51] + [42] + [412]
(3)2 [6] + [32]
(3)(2)(1) [6] + [51] + [42] + [32] + [321]
(3)(1)3 [6] + 3[51] + 3[42] + 3[412] + [32] + 3[321] + [313]
(2)3 [6] + 3[42] + [23]
(2)2(1)2 [6] + 2[51] + 3[42] + [412] + 2[32] + 4[321] + [23] + [2212]
(2)(1)4 [6] + 4[51] + 7[42] + 6[412] + 4[32] + 16[32]
+, 4[313] + 3[23] + 6[2212] + [214]
(1)6 [6] + 6[51] + 15[42] + 15[412] + 10[32] + 60[321]
+, 20[313] + 15[23] + 45[2212] + 15[214] + [16]

(*) [32]=[33], etc.

Source: Compiled from David and Kendall (1955).


Unnumbered Display Equation

The expected values of such expansions are given in terms of product of the moments (independence) times the number of terms in the sum, e.g.,

Unnumbered Display Equation


2.6.1 The Multinomial Distribution

Consider an experiment in which the result of each trial belongs to one of k alternative categories. Let θ′ = (θ1, …, θk) be a probability vector, i.e., 0 < θi < 1 for all i = 1, …, k and inline.jpg = 1. θi designates the probability that the outcome of an individual trial belongs to the ith category. Consider n such independent trials, n ≥ 1, and let X = (X1, …, Xk) be a random vector. Xi is the number of trials in which the ith category is realized, inline.jpg = n. The distribution of X is given by the multinomial probability distribution

(2.6.1) numbered Display Equation

where ji = 0, 1, …, n and inline.jpg = n. These terms are obtained by the multinomial expansion of (θ1 + … + θk)n. Hence, their sum equals 1. We will designate the multinomial distribution based on n trials and probability vector θ by M(n, θ). The binomial distribution is a special case, when k = 2. Moreover, the marginal distribution of Xi is the binomial B(n, θi). The joint marginal distribution of any pair (Xi, Xi) where 1 ≤ i < i′ ≤ k is the corresponding trinomial, with probability distribution function

(2.6.2) numbered Display Equation

We consider now the moments of the multinomial distribution. From the marginal Binomial distribution of the Xs we have

(2.6.3) numbered Display Equation

To obtain the covariance of Xi, Xj, ij we proceed in the following manner. If n = 1 then E{Xi Xj} = 0 for all ij, since only one of the components of X is one and all the others are zero. Hence, E{Xi Xj} – E{Xi} E{Xj} = –θi θj if ij. If n > 1, we obtain the result by considering the sum of n independent vectors. Thus,

(2.6.4) numbered Display Equation

We conclude the section with a remark about the joint moment generating function (m.g.f.) of the multinomial random vector X. This function is defined in the following manner. Since Xk = inline.jpg, we define for every k ≥ 2

(2.6.5) numbered Display Equation

One can prove by induction on k that

(2.6.6) numbered Display Equation

2.6.2 Multivariate Negative Binomial

Let X = (X1, …, Xk) be a k–dimensional random vector. Each random variable, Xi, i = 1, …, k, can assume only nonnegative integers. Their joint probability distribution function is given by

(2.6.7) numbered Display Equation

where j1, …, jk = 0, 1, …; 0 < ν < ∞, 0 < θi < 1 for each i = 1, …, k and inline.jpg. We develop here the basic theory for the case of k = 2. (For k = 1 the distribution reduces to the univariate NB(θ, ν). Summing first with respect to j2 we obtain

(2.6.8) numbered Display Equation

Hence, the marginal of Xi is

(2.6.9) numbered Display Equation

where nb(j; ψ, ν) is the p.d.f. of the negative binomial NB(ψ, ν). By dividing the joint probability distribution function g(j1, j2;θ1, θ2, ν) by inline.jpg, we obtain that the conditional distribution of X2 given X1 is the negative binomial NB(θ2, ν + X1). Accordingly, if NB(θ1, θ2, ν) designates a bivariate negative binomial with parameters (θ1, θ2, ν), then the expected value of Xi is given by

(2.6.10) numbered Display Equation

The variance of the marginal distribution is

(2.6.11) numbered Display Equation

Finally, to obtain the covariance between X1 and X2 we determine first

(2.6.12) numbered Display Equation


(2.6.13) numbered Display Equation

We notice that, contrary to the multinomial case, the covariances of any two components of the multivariate negative binomial vector are all positive.

2.6.3 Multivariate Hypergeometric Distributions

This family of k–variate distributions is derived by a straightforward generalization of the univariate model. Accordingly, suppose that a finite population of elements contain M1 of type 1, M2 of type 2, …, Mk of type k and inline.jpg of other types. A sample of n elements is drawn at random and without replacement from this population. Let Xi, i = 1, …, k denote the number of elements of type i observed in the sample. The p.d.f. of X = (X1, …, Xk) is

(2.6.14) numbered Display Equation

One immediately obtains that the marginal distributions of the components of X are hypergeometric distributions, with parameters (N, Mi, n), i = 1, …, k. If we designate by H(N, M1, …, Mk, n) the multivariate hypergeometric distribution, then the conditional distribution of (Xr + 1, …, Xk) given (X1 = j1, …, Xr = jr) is the hypergeometric inline.jpg. Using this result and the law of the iterated expectation we obtain the following result, for all ij,

(2.6.15) numbered Display Equation

This result is similar to that of the multinomial (2.6.4), which corresponds to sampling with replacement.


2.7.1 Basic Theory

A random vector (X1, …, Xk) of the continuous type has a k–variate multinormal distribution if its joint p.d.f. can be expressed in vector and matrix notation as

(2.7.1) numbered Display Equation

for -∞ < ξi < ∞, i = 1, …, k. Here, x = (x1, …, xk)′, ξ} = (ξ1, …, ξk)′. V is a k × k symmetric positive definite matrix and |V| is the determinant of V. We introduce the notation XN(ξ}, V). We notice that the k–variate multinormal p.d.f. (2.7.1) is symmetric about the point ξ. Hence, ξ is the expected value (mean) vector of X. Moreover, all the moments of X exist.

The m.g.f. of X is

(2.7.2) numbered Display Equation

To establish formula (2.7.2) we can assume, without loss of generality, that ξ = 0, since if MX(t) is the m.g.f. of X and Y = X + b, then the m.g.f. of Y is MY(t) = exp(tb)MX(t). Thus, we have to determine

(2.7.3) numbered Display Equation

Since V is positive definite, there exists a nonsingular matrix D such that V = DD′. Consider the transformation Y = D−1X; then xV−1x = yy and tx = tDy. Therefore,

(2.7.4) numbered Display Equation

Finally, the Jacobian of the transformation is |D| and

(2.7.5) numbered Display Equation

Since |D| = |V|1/2 and (2π)k/2 times the multiple integral on the right–hand side is equal to one, we establish (2.7.2). In order to determine the variance–covariance matrix of X we can assume, without loss of generality, that its expected value is zero. Accordingly, for all i, j,

(2.7.6) numbered Display Equation

From (2.7.2) and (2.7.6), we obtain that cov(Xi, Xj) = σij. (i, j = 1, …, k), where σij is the (i, j)th element of V. Thus, V is the variance–covariance matrix of X.

A k–variate multinormal distribution is called standard if ξi = 0 and σii = 1 for all i = 1, …, k. In this case, the variance matrix will be denoted by R since its elements are the correlations between the components of X. A standard normal vector is often denoted by Z, its joint p.d.f. and c.d.f. by inline.jpgk(z | R) and Φk(z | R), respectively.

2.7.2 Distribution of Subvectors and Distributions of Linear Forms

In this section we present several basic results without proofs. The proofs are straightforward and the reader is referred to Anderson (1958) and Graybill (1961).

Suppose that a k–dimensional vector X has a multinormal distribution N(μ, V). We consider the two subvectors Y and Z, i.e., X′ = (Y′, Z′), where Y is r–dimensional, 1 ≤ r < k.

Partition correspondingly the expectation vector ξ to ξ′ = (η′, ζ′) and the covariance matrix to

Unnumbered Display Equation

The following results are fundamental to the multinormal theory.

(i) YN(η, V11)
(ii) ZN(ζ, V22)
(iii) Y | Zinline.jpg

and an analogous formula can be obtained for the conditional distribution of Z given Y.

The conditional expectation

(2.7.7) numbered Display Equation

is called the linear regression of Y on Z. The conditional covariance matrix

(2.7.8) numbered Display Equation

represents the variances and covariances of the components of Y around the linear regression hyperplane. The above results have the following converse counterpart. Suppose that Y and Z are two vectors such that

(i) Y | ZN(AZ, V)


(ii) ZN(ζ, D);

then the marginal distribution of Y is the multinormal

Unnumbered Display Equation

and the joint distribution of Y and Z is the multinormal, with expectation vector (ζA′, ζ′)′ and a covariance matrix

Unnumbered Display Equation

Finally, if XN(ξ, V) and Y = b + AX, then YN(b + Aξ, AVA′). That is, every linear combination of normally distributed random variables is normally distributed.

In the case of k = 2, the multinormal distribution is called a bivariate normal distribution. The joint p.d.f. of a bivariate normal distribution is

(2.7.9) numbered Display Equation

-∞ < x, y < ∞.

The parameters ξ and η are the expectations, and inline.jpg and inline.jpg are the variances of X and Y, respectively. ρ is the coefficient of correlation.

The conditional distribution of Y given {X = x} is normal with conditional expectation

(2.7.10) numbered Display Equation

where β = ρ σ2/σ1. The conditional variance is

(2.7.11) numbered Display Equation

These formulae are special cases of (2.7.7) and (2.7.8). Since the joint p.d.f. of (X, Y) can be written as the product of the conditional p.d.f. of Y given X, with the marginal p.d.f. of X, we obtain the expression,

(2.7.12) numbered Display Equation

This expression can serve also as a basis for an algorithm to compute the Bivariate–Normal c.d.f., i.e.,

(2.7.13) numbered Display Equation

Let Z1, Z2 and Z3 have a joint standard Trivariate–Normal distribution, with a correlation matrix

Unnumbered Display Equation

The conditional Bivariate–Normal distribution of (Z1, Z2) given Z3 has a covariance matrix

(2.7.14) numbered Display Equation

The conditional correlation between Z1 and Z2, given Z3 can be determined from (2.7.14). It is called the partial correlation of Z1, Z2 under Z3 and is given by

(2.7.15) numbered Display Equation

2.7.3 Independence of Linear Forms

Let X = (X1, …, Xk)′ be a multinormal random vector. Without loss of generality, assume that E{X} = 0. Let V be the covariance matrix of X. We investigate first the conditions under which two linear functions Y1 = αX and Y2 = βX are independent.

Let Y = (Y1, Y2)′, A = inline.jpg. That is, A is a 2 × k matrix and Y = AX. Y has a bivariate normal distribution with a covariance matrix AVA′. Y1 and Y2 are independent if and only if cov(Y1, Y2) = 0. Moreover, cov(Y1, Y2) = αVβ. Since V is positive definite there exists a nonsingular matrix C such that V = CC′. Accordingly, cov(Y1, Y2) = 0 if and only if (Cα)′ (Cβ) = 0. This means that the vectors Cα and Cβ should be orthogonal. This condition is generalized in a similar fashion to cases where Y1 and Y2 are vectors. Accordingly, if Y1 = AX and Y2 = BX, then Y1 and Y2 are independent if and only if AVB′ = 0. In other words, the column vectors of CA′ should be mutually orthogonal to the column vectors of CB.


In this section, we study the distributions of symmetric quadratic forms in normal random variables. We start from the simplest case.

Case A:

Unnumbered Display Equation

Assume first that σ2 = 1. The density of X is then inline(x) = inline.jpgexpinline.jpg. Therefore, the p.d.f. of Q is

(2.8.1) numbered Display Equation

since inline.jpg.

Comparing fQ(y) with the p.d.f. of the gamma distributions, we conclude that if σ2 = 1 then QGinline.jpg ∼ χ2 [1]. In the more general case of arbitrary σ2, Qσ2χ2[1].

Case B:

Unnumbered Display Equation

This is a more complicated situation. We shall prove that the p.d.f. of Q (and so its c.d.f. and m.g.f.) is, at each point, the expected value of the p.d.f. (or c.d.f. or m.g.f.) of σ2χ2[1 + 2J], where J is a Poisson random variable with mean

(2.8.2) numbered Display Equation

Such an expectation of distributions is called a mixture. The distribution of Q when σ2 = 1 is called a noncentral chi–squared with 1 degree of freedom and parameter of noncentrality λ. In symbols Q ∼ χ2[1;λ]. When λ = 0, the noncentral chi–squared coincides with the chi–squared, which is also called central chi–squared. The proof is obtained by determining first the m.g.f. of Q. As before, assume that σ2 = 1. Then,

(2.8.3) numbered Display Equation

Write, for all t < inline.jpg,

(2.8.4) numbered Display Equation


(2.8.5) numbered Display Equation

Furthermore, inline.jpg. Hence,

(2.8.6) numbered Display Equation

According to Table 2.2, inline.jpg is the m.g.f. of χ2[1+2j]. Thus, according to (2.8.6) the m.g.f. of χ2[1;λ] is the mixture of the m.g.f.s of χ2[1+2J], where J has a Poisson distribution, with mean λ as in (2.8.2). This implies that the distribution of χ2[1;λ] is the marginal distribution of X in a model where (X, J) have a joint distribution, such that the conditional distribution of X given {J = j} is like that of χ2[1+2j] and the marginal distribution of J is Poisson with expectation λ. From Table 2.2, we obtain that E2[ν]} = ν and V2[ν]} = 2ν. Hence, by the laws of the iterated expectation and total variance

(2.8.7) numbered Display Equation


(2.8.8) numbered Display Equation

Case C:

X1, …, Xn are independent; XiN(ξi, σ2), i = 1, …, n,

Unnumbered Display Equation

It is required that all the variances σ2 are the same. As proven in Case B,

(2.8.9) numbered Display Equation

where JiPi).

Consider first the conditional distribution of Q given (J1, …, Jn). From the result on the sum of independent chi–squared random variables, we infer

(2.8.10) numbered Display Equation

where Q | (J1, …, Jn) denotes the conditional equivalence of the random variables. Furthermore, since the original Xi s are independent, so are the Ji s and therefore

(2.8.11) numbered Display Equation

Hence, the marginal distribution of Q is the mixture of σ2χ2[n + 2M] where MP1 + … + λn). We have thus proven that

(2.8.12) numbered Display Equation

Case D:

Unnumbered Display Equation

where A is a real symmetric matrix. The following is an important result.

(2.8.13) numbered Display Equation

if and only if VA is an idempotent matrix of rank r (Graybill, 1961). The proof is based on the fact that every positive definite matrix V can be expressed as V = CC′, where C is nonsingular. If Y = C−1X then YN(C−1ξ, I) and XAX = Y′C′ACY. CAC is idempotent if and only if VA is idempotent.

The following are important facts about real symmetric idempotent matrices.

(i) A is idempotent if A2 = A.
(ii) All eigenvalues of A are either 1 or 0.
(iii) Rank (A) = tr.{A}, where tr.{A} = inline.jpg, is the sum of the diagonal elements of A.
(iv) The only nonsingular idemptotent matrix is the identity matrix I.


Without loss of generality, we assume that XN(0, I). Indeed, if XN(0, V) and V = CC′ make the transformation X* = C−1X, then X* ∼ N(0, I). Let Y = BX and Q = X′ AX, where A is idempotent of rank r, 1 ≤ rk. B is an n × k matrix of full rank, 1 ≤ nk.

Theorem 2.9.1 Y and Q are independent if and only if

(2.9.1) numbered Display Equation

For proof, see Graybill (1961, Ch. 4).

Suppose now that we have m quadratic forms XBi X in a multinormal vector XN(ξ}, I).

Theorem 2.9.2 If XNξ, I) the set of positive semidefinite quadratic forms XBi X (i = 1, …, m) are jointly independent and XBi X∼ χ2[rii], where ri is the rank of Bi and λi =inline.jpg, if any two of the following three conditions are satisfied.

1. Each Bi is idempotent (i = 1, …, m);
2. inline.jpg is idempotent;
3. Bi Bj = 0 for all ij.

This theorem has many applications in the theory of regression analysis, as will be shown later.


Let X1, …, Xn be a set of random variables (having a joint distribution). The order statistic is

(2.10.1) numbered Display Equation

where X(1)X(2) ≤ … ≤ X(n).

If X1, …, Xn are independent random variables having an identical absolutely continuous distribution function F(x) with p.d.f. f(x), then the p.d.f. of the order statistic is

(2.10.2) numbered Display Equation

To obtain the p.d.f. of the ith order statistic X(i), i = 1, …, n, we can integrate (2.10.2) over the set

(2.10.3) numbered Display Equation

This integration yields the p.d.f.

(2.10.4) numbered Display Equation

-∞ < ξ < ∞. We can obtain this result also by a nice probabilistic argument. Indeed, for all dx sufficiently small, the trinomial model yields

(2.10.5) numbered Display Equation

where o(dx) is a function of dx that approaches zero at a faster rate than dx, i.e., o(dx)/dx → 0 as dx → 0.

Dividing (2.10.5) by 2dx and taking the limit as dx → 0, we obtain (2.10.4). The joint p.d.f. of (X(i), X(j)) with 1≤ i < jn is obtained similarly as

(2.10.6) numbered Display Equation

In a similar fashion we can write the joint p.d.f. of any set of order statistics. From the joint p.d.f.s of order statistics we can derive the distribution of various functions of the order statistics. In particular, consider the sample median and the sample range.

The sample median is defined as

(2.10.7) numbered Display Equation

That is, half of the sample values are smaller than the median and half of them are greater. The sample range Rn is defined as

(2.10.8) numbered Display Equation

In the case of absolutely continuous independent r.v.s, having a common density f(x), the density g(x) of the sample median is

(2.10.9) numbered Display Equation

We derive now the distribution of the sample range Rn. Starting with the joint p.d.f. of (X(1), X(n))

(2.10.10) numbered Display Equation

we make the transformation u = x, r = yx.

The Jacobian of this transformation is J = 1 and the joint density of (u, r) is

(2.10.11) numbered Display Equation

Accordingly, the density of Rn is

(2.10.12) numbered Display Equation

For a comprehensive development of the theory of order statistics and interesting applications, see the books of David (1970) and Gumbel (1958).


In many problems of statistical inference, one considers the distribution of the ratio of a statistic, which is normally distributed to its standard–error (the square root of its variance). Such ratios have distributions called the t–distributions. More specifically, let UN(0, 1) and W ∼ (χ2[ν]/ν)1/2, where U and W are independent. The distribution of U/W is called the “student’s t–distribution.” We denote this statistic by t[ν] and say that U/W is distributed as a (central) t[ν] with ν degrees of freedom.

An example for the application of this distribution is the following. Let X1, …, Xn be i.i.d. from a N(ξ, σ2) distribution. We have proven that the sample mean inline.jpg is distributed as inline.jpg and is independent of the sample variance S2, where S2σ2χ2[n – 1]/(n – 1). Hence,

(2.11.1) numbered Display Equation

To find the moments of t[ν] we observe that, since the numerator and denominator are independent,

(2.11.2) numbered Display Equation

Thus, all the existing odd moments of t[ν] are equal to zero, since E{Ur} = 0 for all r = 2m + 1. The existence of E{(t[ν])r} depends on the existence of E2[ν]/ν)r/2}. We have

(2.11.3) numbered Display Equation

Accordingly, a necessary and sufficient condition for the existence of E{(t[ν])r} is ν > r. Thus, if ν > 2 we obtain that

(2.11.4) numbered Display Equation

This is also the variance of t[ν]. We notice that V{t[ν]} → 1 as ν → ∞. It is not difficult to derive the p.d.f. of t[ν], which is

(2.11.5) numbered Display Equation

The c.d.f. of t[ν] can be expressed in terms of the incomplete beta function. Due to the symmetry of the distribution around the origin

(2.11.6) numbered Display Equation

We consider now the distribution of (U + ξ)/W, where ξ is any real number. This ratio is called the noncentral t with ν degrees of freedom, and parameter of noncentrality ξ. This variable is the ratio of two independent random variables namely N(ξ, 1) to (χ2[ν]/ν)1/2. If we denote the noncentral t by t[ν ;ξ], then

(2.11.7) numbered Display Equation

Since the random variables in the numerator and denominator of (2.11.7) are independent, one obtains

(2.11.8) numbered Display Equation

and that the central moments of orders 2 and 3 are

(2.11.9) numbered Display Equation


(2.11.10) numbered Display Equation

This shows that the t[ν ;ξ] is not symmetric. Furthermore, since U + ξ ∼ – U + ξ we obtain that, for all –∞ < ξ < ∞,

(2.11.11) numbered Display Equation

In particular, we have seen this in the central case (ξ = 0). The formulae of the p.d.f. and the c.d.f. of the noncentral t[ν ;ξ] are quite complicated. There exists a variety of formulae for numerical computations. We shall not present these formulae here; the interested reader is referred to Johnson and Kotz (1969, Ch. 31). In the following section, we provide a representation of these distributions in terms of mixtures of beta distributions.

The univariate t–distribution can be generalized to a multivariate–t in a variety of ways. Consider an m–dimensional random vector X having a multinomial distribution N(ξ, σ2 R), where R is a correlation matrix. This is the case when all components of X have the same variance σ2. Recall that the marginal distribution of

Unnumbered Display Equation

Thus, if S2σ2 χ2[ν]/ν independently of Y1, …, Ym, then

Unnumbered Display Equation

have the marginal t–distributions t[ν]. The p.d.f. of the multivariate distribution of inline.jpg is given by

(2.11.12) numbered Display Equation

Generally, we say that X has a t[ν ; ξ, inline.jpg] distribution if its multivariate p.d.f. is

(2.11.13) numbered Display Equation

This distribution has applications in Bayesian analysis, as shown in Chapter 8.


The F–distributions are obtained by considering the distributions of ratios of two independent variance estimators based on normally distributed random variables. As such, these distributions have various important applications, especially in the analysis of variance and regression (Section 4.6). We introduce now the F–distributions formally. Let χ2[ν1] and χ2[ν2] be two independent chi–squared random variables with ν1 and ν2 degrees of freedom, respectively. The ratio

(2.12.1) numbered Display Equation

is called an F–random variable with ν1 and ν2 degrees of freedom. It is a straightforward matter to derive the p.d.f. of F[ν1, ν2], which is given by

(2.12.2) numbered Display Equation

The cumulative distribution function can be computed by means of the incomplete beta function ratio according to the following formula

(2.12.3) numbered Display Equation


(2.12.4) numbered Display Equation

In order to derive this formula, we recall that if inline.jpg and inline.jpg are two independent gamma random variables, then (see Example 2.2)

(2.12.5) numbered Display Equation


(2.12.6) numbered Display Equation

We thus obtain

(2.12.7) numbered Display Equation

For testing statistical hypotheses, especially for the analysis of variance and regression, one needs quantiles of the F[ν1, ν2] distribution. These quantiles are denoted by Fp [ν1, ν2] and are tabulated in various statistical tables. It is easy to establish the following relationship between the quantiles of F[ν1, ν2] and those of F[ν2, ν1], namely,

(2.12.8) numbered Display Equation

The quantiles of the F[ν1, ν2] distribution can also be determined by those of the beta distribution by employing formula (2.12.5). If we denote by βγ (p, q) the values of x for which Ix (p, q) = γ, we obtain from (2.12.4) that

(2.12.9) numbered Display Equation

The moments of F[ν1, ν2] are obtained in the following manner. For a positive integer r

(2.12.10) numbered Display Equation

We realize that the rth moment of F[ν1, ν2] exists if and only if ν2 > 2r. In particular,

(2.12.11) numbered Display Equation

Similarly, if ν2 > 4 then

(2.12.12) numbered Display Equation

In various occasions one may be interested in an F–like statistic, in which the ratio consists of a noncentral chi–squared in the numerator. In this case the statistic is called a noncentral F. More specifically, let χ2[ν1;λ] be a noncentral chi–squared with ν1 degrees of freedom and a parameter of noncentrality λ. Let χ2[ν2] be a central chi–squared with ν2 degrees of freedom, independent of the noncentral chi–squared. Then

(2.12.13) numbered Display Equation

is called a noncentral F[ν1, ν2;λ] statistic. We have proven earlier that χ2[ν1;λ] ∼ χ2[ν1 + 2J], where J has a Poisson distribution with expected value λ. For this reason, we can represent the noncentral F[ν1, ν2;λ] as a mixture of central F statistics.

(2.12.14) numbered Display Equation

where JP (λ). Various results concerning the c.d.f. of F[ν1, ν2;λ], its moments, etc., can be obtained from relationship (2.12.14). The c.d.f. of the noncentral F statistic is

(2.12.15) numbered Display Equation

Furthermore, following (2.12.3) we obtain

Unnumbered Display Equation


Unnumbered Display Equation

As in the central case, the moments of the noncentral F are obtained by employing the law of the iterated expectation and (2.12.14). Thus,

(2.12.16) numbered Display Equation

However, for all j = 0, 1, …, E{F[ν1 + 2j, ν2]} = ν2/(ν2-2). Hence,

(2.12.17) numbered Display Equation

Hence, applying the law of the total variance

(2.12.18) numbered Display Equation

We conclude the section with the following observation on the relationship between t– and the F–distributions. According to the definition of t[ν] we immediately obtain that

(2.12.19) numbered Display Equation


(2.12.20) numbered Display Equation

Moreover, due to the symmetry of the t[ν] distribution, for t > 0 we have 2P{t[ν] ≤ t} = 1 + P{F[1, ν] ≤ t2}, or

(2.12.21) numbered Display Equation

In a similar manner we obtain a representation for P{|t[ν, ξ]| ≤ t}. Indeed, (N(0, 1) + ξ)2 ∼ χ2 [1;λ] where λ = inline.jpg. Thus, according to (2.12.16)

(2.12.22) numbered Display Equation


Consider a sample of n i.i.d. vectors (X1, Y1), …, (Xn, Yn) that have a common bivariate normal distribution

Unnumbered Display Equation

In this section we develop the distributions of the following sample statistics.

(i) The sample correlation coefficient

(2.13.1) numbered Display Equation

(ii) The sample coefficient of regression

(2.13.2) numbered Display Equation


(2.13.3) numbered Display Equation

As mentioned earlier, the joint density of (X, Y) can be written as

(2.13.4) numbered Display Equation

where β = ρ σ2/σ1. Hence, if we make the transformation

(2.13.5) numbered Display Equation

then Ui and Vi are independent random variables, inline.jpg and inline.jpg. We consider now the distributions of the variables

(2.13.6) numbered Display Equation

where SSDU, SPDUV and SSDV are defined as in (2.13.3) in terms of (Ui, Vi), i = 1, …, n. Let U = (U1, …, Un)′ and V = (V1, …, Vn)′. We notice that the conditional distribution of SPDUV = inline.jpg given U is the normal inline.jpg. Hence, the conditional distribution of W1 given U is N(0, 1). This implies that W1 is N(0, 1), independently of U. Furthermore, W1 and W3 are independent, and W3 ∼ χ2[n – 1]. We consider now the variable W2. It is easy to check

(2.13.7) numbered Display Equation

where inline.jpg. A is idempotent and so is B = inline.jpg. Furthermore, the rank of B is n – 2. Hence, the conditional distribution of SSDVinline.jpg given U is like that of inline.jpg This implies that the distribution of W2 is like that of χ2[n – 2]. Obviously W2 and W3 are independent. We show now that W1 and W2 are independent. Since SPDUV = V′ AU and since BAU = inline.jpg we obtain that, for any given U, SPDUV and inline.jpg are conditionally independent. Moreover, since the conditional distributions of SPDUV/(SSDU)1/2 and of inline.jpg are independent of U, W1 and W2 are independent. The variables W1, W2, and W3 can be written in terms of SSDX, SPDXY, and SSDY in the following manner.

(2.13.8) numbered Display Equation

Or, equivalently,

(2.13.9) numbered Display Equation

From (2.13.9) one obtains that

(2.13.10) numbered Display Equation

An immediate conclusion is that, when ρ = 0,

(2.13.11) numbered Display Equation

This result has important applications in testing the significance of the correlation coefficient. Generally, one can prove that the p.d.f. of r is

(2.13.12) numbered Display Equation


A family of distribution inline.jpg, having density functions f(x;θ) with respect to some σ–finite measure μ, is called a k–parameter exponential type family if

(2.14.1) numbered Display Equation

-∞ < x < ∞, θ inline.jpg Θ. Here ψi(θ), i = 1, …, k are functions of the parameters and Ui (x), i = 1, …, k are functions of the observations.

In terms of the parameters ψ = (ψ1, …, ψk)′ and the statistics U = (U1 (x), …, Uk (x))′, the p.d.f of a k–parameter exponential type distribution can be written as

(2.14.2) numbered Display Equation

where K(ψ) = -log A*(ψ). Notice that h*(U(x)) > 0 for all x on the support set of inline.jpg, namely the closure of the smallest Borel set S, such that Pψ{S} = 1 for all ψ. If h*(U(x)) does not depend on ψ, we say that the exponential type family inline.jpg is regular. Define the domain of convergence to be

(2.14.3) numbered Display Equation

The family inline.jpg is called full if the parameter space Ω coincides with Ω*. Formula (2.14.2) is called the canonical form of the p.d.f.; ψ are called the canonical (or natural) parameters. The statistics Ui (x)(i = 1, …, k) are called canonical statistics. The family inline.jpg is said to be of order k if (1, ψ1, …, ψk) are linearly independent functions of θ. Indeed if, for example, ψk = inline.jpg, for some α0, …, αk – 1, which are not all zero, then by the reparametrization to

Unnumbered Display Equation

we reduce the number of canonical parameters to k – 1. If (1, ψ1, …, ψk) are linearly independent, the exponential type family is called minimal.

The following is an important theorem.

Theorem 2.14.1 If Equation (2.14.2) is a minimal representation then

(i) Ω* is a convex set, and K(ψ) is strictly convex function on Ω*.
(ii) K(ψ) is a lower semicontinuous function on inline.jpg, and continuous in the interior of Ω*.

For proof, see Brown (1986, p. 19).


(2.14.4) numbered Display Equation

Accordingly, λ (ψ) = exp {K(ψ)} or K(ψ) = log λ (ψ). λ (ψ) is an analytic function on the interior of Ω* (see Brown, 1986, p. 32). Thus, λ (ψ) can be differentiated repeatedly under the integral sign and we have for nonnegative integers li, such that inline.jpg,

(2.14.5) numbered Display Equation

The m.g.f. of the canonical p.d.f. (2.14.2) is, for ψ in Ω*,

(2.14.6) numbered Display Equation

for t sufficiently close to 0. The logarithm of M(t;ψ), the cumulants generating function, is given here by

(2.14.7) numbered Display Equation


(2.14.8) numbered Display Equation

where inline.jpg denotes the gradient vector, i.e.,

Unnumbered Display Equation

Similarly, the covariance matrix of U is

(2.14.9) numbered Display Equation

Higher order cumulants can be obtained by additional differentiation of K(ψ). We conclude this section with several comments.

1. The marginal distributions of canonical statistics are canonical exponential type distributions.
2. The conditional distribution of a subvector of canonical exponential type statistics, given the other canonical statistics, is also a canonical exponential type distribution.
3. The dimension of Ω* in a minimal canonical exponential family of order k might be smaller than k. In this case we call inline.jpg a curved exponential family (Efron, 1975, 1978).


Let X1, X2, …, Xn be i.i.d. random variables having a distribution, with all required moments existing.

2.15.1 Edgeworth Expansion

The Edgeworth Expansion of the distribution of Wn = inline.jpg, which is developed below, may yield more satisfactory approximation than that of the normal. This expansion is based on the following development.

The p.d.f. of the standard normal distribution, inline.jpg(x), has continuous derivatives of all orders everywhere. By repeated differentiation we obtain

(2.15.1) numbered Display Equation

and generally, for j ≥ 1,

(2.15.2) numbered Display Equation

where Hj(x) is a polynomial of order j, called the Chebychev–Hermite polynomial. These polynomials can be obtained recursively by the formula, j ≥ 2,

(2.15.3) numbered Display Equation

where H0(x) ≡ 1 and H1(x) = x.

From this recursive relation one can prove by induction, that an even order polynomial H2m(x), m ≥ 1, contains only terms with even powers of x, and an odd order polynomial, H2m + 1(x), n ≥ 0, contains only terms with odd powers of x. One can also show that

(2.15.4) numbered Display Equation

Furthermore, one can prove the orthogonality property

(2.15.5) numbered Display Equation

Thus, the system {Hj(x), j = 0, 1, …} of Chebychev–Hermite polynomials constitutes an orthogonal base for representing every continuous, integrable function f(x) as

(2.15.6) numbered Display Equation

where, according to (2.15.5),

(2.15.7) numbered Display Equation

In particular, if f(x) is a p.d.f. of an absolutely continuous distribution, having all moments, then, for all -∞ < x < ∞,

(2.15.8) numbered Display Equation


Unnumbered Display Equation

etc. If X is a standardized random variable, i.e., μ1 = 0 and μ2 = inline.jpg = 1, then its p.d.f. f(x) can be approximated by the formula

(2.15.9) numbered Display Equation

which involves the first four terms of the expansion (2.15.8). For the standardized sample mean inline.jpg,

(2.15.10) numbered Display Equation


(2.15.11) numbered Display Equation

where β1 and β2 are the coefficients of skewness and kurtosis.

The same type of approximation with additional terms is known as the Edgeworth expansion. The Edgeworth approximation to the c.d.f. of Wn is

(2.15.12) numbered Display Equation

The remainder term in this approximation is of a smaller order of magnitude than inline.jpg, i.e., inline.jpg. One can obviously expand the distribution with additional terms to obtain a higher order of accuracy. Notice that the standard CLT can be proven by taking limits, as n → ∞, of the two sides of (2.15.12).

We conclude this section with the remark that Equation (2.15.9) could serve to approximate the p.d.f. of any standardized random variable, having a continuous, integrable p.d.f., provided the moments exist.

2.15.2 Saddlepoint Approximation

As before, let X1, …, Xn be i.i.d. random variables having a common density f(x). We wish to approximate the p.d.f of inline.jpg. Let M(t) be the m.g.f. of t, assumed to exist for all t in (-∞, t0), for some 0 < t0 < ∞. Let K(t) = log M(t) be the corresponding cumulants generating function.

We construct a family of distributions inline.jpg = {f(x, ψ): -∞ < ψ < t0} such that

(2.15.13) numbered Display Equation

The family inline.jpg is called an exponential conjugate to f(x). Notice that f(x; 0) = f(x), and that inline.jpg.

Using the inversion formula for Laplace transforms, one gets the relationship

(2.15.14) numbered Display Equation

where inline.jpg denotes the p.d.f. of the sample mean of n i.i.d. random variables from f(x; ψ). The p.d.f. inline.jpg is now approximated by the expansion (2.15.9) with additional terms, and its modification for the standardized mean Wn. Accordingly,

(2.15.15) numbered Display Equation

where inline.jpg(z) is the p.d.f. of inline.jpg, and ρ4(ψ) = K(4)(ψ)/(K(2)(ψ))2. Furthermore, μ (ψ) = K′(ψ) and σ2(ψ) = K(2)(ψ).

The objective is to approximate inline.jpg. According to (2.15.14) and (2.15.15), we approximate inline.jpg by

(2.15.16) numbered Display Equation

The approximation is called a saddlepoint approximation if we substitute in (2.15.16) ψ = inline.jpg, where ψ is a point in (-∞, t0) that maximizes f(x; ψ). Thus, inline.jpg is the root of the equation

Unnumbered Display Equation

As we have seen in Section 2.14, K(ψ) is strictly convex in the interior of (−∞, t0). Thus, K′(ψ) is strictly increasing in (-∞, t0). Thus, if inline.jpg exists then it is unique. Moreover, the value of z at ψ = inline.jpg is z = 0. It follows that the saddlepoint approximation is

(2.15.17) numbered Display Equation

The coefficient c is introduced on the right–hand side of (2.15.17) for normalization. A lower order approximation is given by the formula

(2.15.18) numbered Display Equation

The saddelpoint approximation to the tail of the c.d.f., i.e., inline.jpg is known to yield very accurate results. There is a famous Lugannani–Rice (1980) approximation to this tail probability. For additional reading, see Barndorff–Nielson and Cox (1979), Jensen (1995), Field and Ronchetti (1990), Reid (1988), and Skovgaard (1990).


Example 2.1. In this example we provide a few important results on the distributions of sums of independent random variables.

A. Binomial

If X1 and X2 are independent, X1B(N1, θ), X2B(N2, θ), then X1 + X2B(N1 + N2, θ). It is essential that the binomial distributions of X1 and X2 will have the same value of θ. The proof is obtained by multiplying the corresponding m.g.f.s.

B. Poisson

If X1P1) and X2P2) then, under independence, X1 + X2P1 + λ2).

C. Negative–Binomial

If X1NB(ψ, ν1) and X2NB(ψ, ν2) then, under independence, X1 + X2NB(ψ, ν1 + ν2). It is essential that the two distributions will depend on the same ψ.

D. Gamma

If X1G(λ, ν1) and X2G(λ, ν2) then, under independence, X1 + X2G(λ, ν1 + ν2). It is essential that the two values of the parameter λ will be the same. In particular,

Unnumbered Display Equation

for all ν1, ν2 = 1, 2, …; where inline.jpg[νi], i = 1, 2, denote two independent χ2–random variables with ν1 and ν2 degrees of freedom, respectively. This result has important applications in the theory of normal regression analysis.

E. Normal

If X1N(μ, inline.jpg) and X2N(μ2, inline.jpg) and if X1 and X2 are independent, then X1 + X2N(μ1 + μ2, inline.jpg +inline.jpg). A generalization of this result to the case of possible dependence is given later.          inline.jpg

Example 2.2 Using the theory of transformations, the following important result is derived. Let X1 and X2 be independent,

Unnumbered Display Equation

then the ratio R = X1/(X1 + X2) has a beta distribution, β (ν1, ν2), independent of λ. Furthermore, R and T = X1 + X2 are independent. Indeed, the joint p.d.f. of X1 and X2 is

Unnumbered Display Equation

Consider the transformation

Unnumbered Display Equation

The Jacobian of this transformation is J(x1, t) = 1. The joint p.d.f. of X1 and T is then

Unnumbered Display Equation

We have seen in the previous example that T = X1 + X2G(λ, ν1 + ν2). Thus, the marginal p.d.f. of T is

Unnumbered Display Equation

Making now the transformation

Unnumbered Display Equation

we see that the Jacobian is J(r, t) = t. Hence, from (2.4.8) and (2.4.9) the joint p.d.f. of r and t is, for 0 ≤ r ≤ 1 and 0 ≤ t < ∞,

Unnumbered Display Equation

This proves that Rβ (ν1, ν2) and that R and T are independent.          inline.jpg

Example 2.3. Let (X, λ) be random variables, such that the conditional distribution of X given λ is Poisson with p.d.f.

Unnumbered Display Equation

and λ ∼ G(ν, Λ). Hence, the marginal p.d.f. of X is

Unnumbered Display Equation

Let inline.jpg. Then inline.jpg. Thus, XNB(ψ, ν), and we get

Unnumbered Display Equation


Unnumbered Display Equation


Unnumbered Display Equation

where G(1, k + 1) and G(1, ν) are independent.

Let R = inline.jpg. According to Example 2.2,

Unnumbered Display Equation

But Uinline.jpg; hence,

Unnumbered Display Equation          inline.jpg

Example 2.4. Let X1, …, Xn be i.i.d. random variables. Consider the linear and the quadratic functions

Unnumbered Display Equation

We compute first the variance of S2. Notice first that S2 does not change its value if we substitute inline.jpg = Xiμ1 for Xi (i = 1, …, n). Thus, we can assume that μ1 = 0 and all moments are central moments.

Unnumbered Display Equation

Now, since X1, …, Xn are i.i.d.,

Unnumbered Display Equation


Unnumbered Display Equation

According to Table 2.3,

Unnumbered Display Equation


Unnumbered Display Equation

Therefore, since μ1 = 0, the independence implies that

Unnumbered Display Equation


Unnumbered Display Equation


Unnumbered Display Equation

At this stage we have to compute

Unnumbered Display Equation

From Table (2.3), (2)(1)2 = [4] + 2[31] + [22] + [212]. Hence,

Unnumbered Display Equation


Unnumbered Display Equation


Unnumbered Display Equation

Finally, substituting these terms we obtain

Unnumbered Display Equation          inline.jpg

Example 2.5. We develop now the formula for the covariance of inline.jpg and S2.

Unnumbered Display Equation


Unnumbered Display Equation

since the independence of Xi and Xj for all ij implies that cov(inline.jpg, Xj) = 0. Similarly,

Unnumbered Display Equation

Thus, we obtain

Unnumbered Display Equation

Finally, if the distribution function F(x) is symmetric about zero, μ3 = 0, and cov(inline.jpg, S2) = 0.          inline.jpg

Example 2.6. The number of items, N, demanded in a given store during one week is a random variable having a Negative–Binomial distribution NB(ψ, ν); 0 < ψ < 1 and 0 < ν < ∞. These items belong to k different classes. Let X = (X1, …, Xk)′ denote a vector consisting of the number of items of each class demanded during the week. These are random variables such that inline.jpg and the conditional distribution of (X1, …, Xk) given N is the multinomial M(N, θ), where θ = (θ1, …, θk) is the vector of probabilities; 0 < θi < 1, inline.jpg. If we observe the X vectors over many weeks and construct the proportional frequencies of the X values in the various classes, we obtain an empirical distribution of these vectors. Under the assumption that the model and its parameters remain the same over the weeks we can fit to that empirical distribution the theoretical marginal distribution of X. This marginal distribution is obtained in the following manner.

The m.g.f. of the conditional multinomial distribution of X* = (X1, …, Xk – 1)′ given N is

Unnumbered Display Equation

Hence, the m.g.f. of the marginal distribution of X* is

Unnumbered Display Equation


Unnumbered Display Equation


Unnumbered Display Equation

This proves that X* has the multivariate Negative–Binomial distribution.          inline.jpg

Example 2.7. Consider a random variable X having a normal distribution, N(ξ, σ2). Let Φ(u) be the standard normal c.d.f. The transformed variable Y = Φ(X) is of interest in various problems of statistical inference in the fields of reliability, quality control, biostatistics, and others. In this example we study the first two moments of Y.

In the special case of ξ = 0 and σ2 = 1, since Φ(u) is the c.d.f. of X, the above transformation yields a rectangular random variable, i.e., YR(0, 1). In this case, obviously E{Y} = 1/2 and V{Y} = 1/12. In the general case, we have according to the law of the iterated expectation

Unnumbered Display Equation

where UN(0, 1), U and X are independent. Moreover, according to (2.7.7), U – X∼ N(–ξ, 1 + σ2). Therefore,

Unnumbered Display Equation

In order to determine the variance of Y we observe first that, if U1, U2 are independent random variables identically distributed like N(0, 1), then P{U1x, U2x} = Φ2(x) for all -∞ < x < ∞. Thus,

Unnumbered Display Equation

where U1, U2 and X are independent and UiN(0, 1), i = 1, 2, U1X and U2X have a joint bivariate normal distribution with mean vector (–ξ, –ξ) and covariance matrix

Unnumbered Display Equation


Unnumbered Display Equation


Unnumbered Display Equation

Generally, the nth moment of Y can be determined by the n–variate multinormal c.d.f. inline.jpg, where the correlation matrix R has off–diagonal elements Rij = σ2/(1 + σ2), for all kj. We do not treat here the problem of computing the standard k–variate multinormal c.d.f. Computer routines are available for small values of k. The problem of the numerical evaluation is generally difficult. Tables are available for the bivariate and the trivariate cases. For further comments on this issue see Johnson and Kotz (1972, pp. 83–132).          inline.jpg

Example 2.8. Let X1, X2, …, Xn be i.i.d. N(0, 1) r.v.s. The sample variance is defined as

Unnumbered Display Equation

Let Q = Σ (Xiinline.jpg)2. Define the matrix J = 11′, where 1′ = (1, …, 1) is a vector of ones. Let A = inline.jpg, and Q = X′AX. It is easy to verify that A is an idempotent matrix. Indeed,

Unnumbered Display Equation

The rank of A is r = n – 1. Thus, we obtained that S2 inline.jpg.          inline.jpg

Example 2.9. Let X1, …, Xn be i.i.d. random variables having a N(ξ, σ2) distribution. The sample mean is inline.jpg and the sample variance is S2 inline.jpg. In Section 2.5 we showed that if the distribution of the Xs is symmetric, then inline.jpg and S2 are uncorrelated. We prove here the stronger result that, in the normal case, inline.jpg and S2 are independent. Indeed,

Unnumbered Display Equation

is distributed like N(0, I). Moreover, S2 inline.jpg. But,

Unnumbered Display Equation

This implies the independence of inline.jpg and S2.          inline.jpg

Example 2.10. Let X be a k–dimensional random vector having a multinormal distribution N(Aβ, σ2I), where A is a k × r matrix of constants, β is an r × 1 vector; 1 ≤ rk, 0 < σ2 < ∞. We further assume that rank (A) = r, and the parameter vector β is unknown. Consider the vector inline.jpg that minimizes the squared–norm ||XAβ} ||2, where ||X||2 = inline.jpg. Such a vector inline.jpg is called the least–squares estimate of β. The vector inline.jpg is determined so that

Unnumbered Display Equation

That is, Ainline.jpg is the orthogonal projection of X on the subspace generated by the column vectors of A. Thus, the inner product of (XAinline.jpg) and Ainline.jpg should be zero. This implies that

Unnumbered Display Equation

The matrix AA is nonsingular, since A is of full rank. Substituting inline.jpg in the expressions for Q1 and Q2, we obtain

Unnumbered Display Equation


Unnumbered Display Equation

We prove now that these quadratic forms are independent. Both

Unnumbered Display Equation

are idempotent. The rank of B1 is kr and that of B2 is r. Moreover,

Unnumbered Display Equation

Thus, the conditions of Theorem 2.9.2 are satisfied and Q1 is independent of Q2. Moreover, Q1σ2 χ2[xr; λ1] and Q2σ2χ2[r;λ2] where

Unnumbered Display Equation


Unnumbered Display Equation          inline.jpg

Example 2.11. Let X1, …, Xn be i.i.d. random variables from a rectangular R(0, 1) distribution. The density of the ith order statistic is then

Unnumbered Display Equation

0≤ x ≤ 1. The p.d.f. of the sample median, for n = 2m + 1, is in this case

Unnumbered Display Equation

The p.d.f. of the sample range is the β (n – 1, 2) density

Unnumbered Display Equation

These results can be applied to test whether a sample of n observation is a realization of i.i.d. random variables having a specified continuous distribution, F(x), since Y = F(Y) ∼ R(0, 1).          inline.jpg

Example 2.12. Let X1, X2, …, Xn be i.i.d. random variables having a common exponential distribution E(λ), 0 < λ < ∞. Let X(1) < X(2) < … < X(n) be the corresponding order statistic. The density of X(1) is

Unnumbered Display Equation

The joint density of X(1) and X(2) is

Unnumbered Display Equation

Let U = X(2)X(1). The joint density of X(1) and U is

Unnumbered Display Equation

and 0 < u < ∞. Notice that fX(1),U(x, u) = fX(1)(xfU(u). Thus X(1) and U are independent, and U is distributed like the minimum of (n – 1) i.i.d. E(λ) random variables. Similarly, by induction on k = 2, 3, …, n, if Uk = X(k)X(k-1) then X(k-1) and Uk are independent and UkE(λ (nk + 1)). Thus, since X(k) = X(1) + U2 + … + Uk, E{X(k)} = inline.jpg and inline.jpg, for all k ≥ 1.          inline.jpg

Example 2.13. Let XN(μ, σ2), -∞ < μ < ∞, 0 < σ2 < ∞. The p.d.f. of X is

Unnumbered Display Equation

Let h(x) = inline.jpg, U1 (x) = x, and U2 (x) = x2. We can write f(x;μ, σ2) as a two–parameter exponential type family. By making the reparametrization (μ, σ2) → (ψ1, ψ2), the parameter space Θ = {(μ, σ2): -∞ < μ < ∞, 0 < σ2 < ∞} is transformed to the parameter space

Unnumbered Display Equation

In terms of (ψ1, ψ2) the density of X can be written as

Unnumbered Display Equation

where h(x) = 1/inline.jpg and

Unnumbered Display Equation

The p.d.f. of the standard normal distribution is obtained by substituting ψ1 = 0, ψ2 = inline.jpg.          inline.jpg

Example 2.14. A simple example of a curved exponential family is

Unnumbered Display Equation

In this case,

Unnumbered Display Equation

with ψ2 = inline.jpg. ψ1 and ψ2 are linearly independent. The rank is k = 2 but

Unnumbered Display Equation

The dimension of Ω* is 1.

The following example shows a more interesting case of a regular exponential family of order k = 3.          inline.jpg

Example 2.15. We consider here a model that is well known as the Model II of Analysis of Variance. This model will be discussed later in relation to the problem of estimation and testing variance components.

We are given n · k observations on random variables Xij (i = 1, …, k; j = 1, …, n). These random variables represent the results of an experiment performed in k blocks, each block containing n trials. In addition to the random component representing the experimental error, which affects the observations independently, there is also a random effect of the blocks. This block effect is the same on all the observations within a block, but is independent from one block to another. Accordingly, our model is

Unnumbered Display Equation

where eij are i.i.d. like N(0, σ2) and ai are i.i.d. like N(0, τ2).

We determine now the joint p.d.f. of the vector X = (X11, …, X1n, X21, …, X2n, …, Xk1, …, Xkn)′. The conditional distribution of X given a = (a1, …, ak)′ is the multinormal N(μ 1nk + ξ (a), σ2Ink), where ξ}′(a) = (a1 1n, a2 1n, …, ak 1n). Hence, the marginal distribution of X is the multinormal N(ξ 1nk, V), where the covariance matrix V is given by a matrix composed of k equal submatrices along the main diagonal and zeros elsewhere. That is, if Jn = 1n 1n is an n × n matrix of 1s,

Unnumbered Display Equation

The determinant of V is (σ2)kn|In + ρ Jn |k, where ρ = τ2/σ2. Moreover, let H be an orthogonal matrix whose first row vector is inline.jpg. Then,

Unnumbered Display Equation

Hence, |V| = σ2nk(1 + nρ)k. The inverse of V is

Unnumbered Display Equation

where (In + ρ Jn)-1 = In – (ρ/(1 + nρ))Jn.

Accordingly, the joint p.d.f. of X is

Unnumbered Display Equation


Unnumbered Display Equation

where inline.jpg. Similarly,

Unnumbered Display Equation

Substituting these terms we obtain,

Unnumbered Display Equation


Unnumbered Display Equation


Unnumbered Display Equation

and make the reparametrization

Unnumbered Display Equation

The joint p.d.f. of X can be expressed then as

Unnumbered Display Equation

The functions U1 (x), U2 (x), and U3 (x) as well as ψ1(θ), ψ2(θ), and ψ3(θ) are linearly independent. Hence, the order is k = 3, and the dimension of Ω* is d = 3.          inline.jpg

Example 2.16. Let X1, …, Xn be i.i.d. random variables having a common gamma distribution G(λ, ν), 0 < λ, ν < ∞. For this distribution β1 = 2ν and β2 = 6ν.

The sample mean inline.jpgn is distributed like inline.jpg. The standardized mean is inline.jpg. The exact c.d.f. of Wn is

Unnumbered Display Equation

On the other hand, the Edgeworth approximation is

Unnumbered Display Equation

In the following table, we compare the exact distribution of Wn with its Edgeworth expansion for the case of ν = 1, n = 10, and n = 20. We see that for n = 20 the Edgeworth expansion yields a very good approximation, with a maximal relative error of -4.5% at x = -2. At x = -1.5 the relative error is 0.9%. At all other values of x the relative error is much smaller.

Unnumbered Table          inline.jpg

Example 2.17. Let X1, …, Xn be i.i.d. distributed as G(λ, ν). inline.jpg. Accordingly,

Unnumbered Display Equation

The cumulant generating function of G(λ, ν) is

Unnumbered Display Equation


Unnumbered Display Equation


Unnumbered Display Equation

Accordingly, inline.jpg = λ – ν /x and

Unnumbered Display Equation

exp{n[K(inline.jpg) – inline.jpgx]} = exp{nνnλx} · inline.jpg. It follows from Equation (2.15.18) that the saddlepoint approximation is

Unnumbered Display Equation

If we substitute in the exact formula the Stirling approximation, inline.jpg, we obtain the saddlepoint approximation.          inline.jpg

