11
Offset normal shape distributions

11.1 Introduction

Rather than conditioning we could consider the marginal distribution of shape after integrating out the similarity transformations. These distributions will be called offset shape distributions. We consider multivariate normal configurations of k-points in , that is the model for the landmarks is:

where Ω is symmetric positive definite. We shall concentrate initially on the important m = 2 dimensional case in this chapter. Kendall (1984) introduced the case where μ consists of coincident points and Ω was proportional to the identity matrix. Mardia and Dryden (1989a,b) and Dryden and Mardia (1991b) further developed the shape distributions for general μ and Ω, in two dimensions. These distributions have been called ‘Mardia–Dryden’ distributions in the literature (Bookstein 1991, 2014; Lele and Richtsmeier 1991; Kendall 1991b; Stuart and Ord 1994).

11.1.1 Equal mean case in two dimensions

11.1.1.1 Isotropic case

Result 11.1 Consider the k ≥ 3 landmarks in to be i.i.d. as bivariate normal with equal means and covariance matrix proportional to the identity, that is

The resulting shape distribution is uniform in Kent’s polar shape coordinates.

Proof: If the original landmarks have the normal distribution of Equation (11.2), then the Helmertized landmarks X* = vec(HX) have a zero mean 2k − 2 dimensional multivariate normal with covariance matrix σ2I2k − 2. Hence the density of X* is:

numbered Display Equation

We transform from X* to (R, s1, …, sk − 2, θ1, …, θk − 1), where R = ||X*|| and are Kent’s polar pre-shape coordinates of Section 4.3.3. The Jacobian of the inverse transformation is R2k − 322 − k and so the joint density of (R, s1, …, sk − 2, θ1, …, θk − 1) is:

numbered Display Equation

We now transform from R to A = R2/(2σ2) and the Jacobian of the inverse transformation is σ2/R. Hence the joint density of (A, s1, …, sk − 2, θ1, …, θk − 1) is:

numbered Display Equation

Integrating out the rotation θk − 1 (which is independently uniform) and the scale A by using:

numbered Display Equation

we have the density of Kent’s polar shape coordinates as:

numbered Display Equation

which is the uniform density.

The shape distribution will also be uniform for non-normal i.i.d. symmetric distributions (subject to certain regularity conditions), that is where the landmarks are independent and coordinates (Xj, Yj) have joint p.d.f. given by:

numbered Display Equation

where r2j = xj2 + y2j for j = 1, …, k.

Although not as simple as Kent’s polar shape coordinates we shall also use the Kendall coordinates of Equation (2.11) and for convenience for the rest of this chapter we write U instead of UK. Under the i.i.d. normal model of Equation (11.2) the p.d.f. of Kendall coordinates U = (U3, …, Uk, V3, …, Vk)T, with respect to the Lebesgue measure, is given by Equation (10.2), and repeated here:

numbered Display Equation

Corollary 11.1 In the triangle case (k = 3) the shape distribution under the i.i.d. normal model of Equation (11.2) is uniform on the shape sphere . Using Kendall’s spherical coordinates (θ, ϕ) [given by Equation (2.13)] the density, with respect to the Lebesgue measure, is:

numbered Display Equation

and 0 ≤ θ ≤ π, 0 ≤ ϕ < 2π.

Proof: Transforming to Kendall’s spherical coordinates using Equation (2.15) we have:

(11.3) numbered Display Equation

The Jacobian of the inverse transformation is given by:

(11.4) numbered Display Equation

and so the result follows.

Mardia (1980), describing joint work with Robert Edwards, gave the offset normal distribution for the reflection shape of triangles using internal angles α1 and α2. The joint density of α1 and α2 is given by:

(11.5) numbered Display Equation

and α3 = π − α1 − α2 ≥ 0, where S = ∑3j = 1sin 2αj and C = ∑3j = 1cos 2αj.

The connection with the uniform distribution in terms of Bookstein coordinates (UB, VB) is as in the following. The uniform density using Bookstein coordinates for the triangle case is:

numbered Display Equation

The transformation between (UB, VB) and (α1, α2), VB > 0, 0 ≤ α1, α2 ≤ π, is obtained from

numbered Display Equation

and the Jacobian is:

numbered Display Equation

Using the fact that

numbered Display Equation

and

numbered Display Equation

the uniform density corresponding to VB ≥ 0 is:

numbered Display Equation

There is an equal contribution to the density from VB ≤ 0 and so the uniform shape density is given by 2f1 as required.

11.1.1.2 Distribution of the Riemannian distance

Result 11.2 When the points are distributed according to the i.i.d. normal model of Equation (11.2) then the marginal distribution of the Riemannian distance ρ between the observed shape u and any fixed shape, has density

Proof: Assume the original landmarks zo are i.i.d. normal with zero mean and unit variance. Transforming to the Helmertized coordinates zH = (zH1, …, zH(k − 1))T = Hzo it follows that |zHj|2 ∼ χ22 independently. Without loss of generality we consider the Riemannian distance ρ from zo to the special rank 1 configuration μ = (1, 0, …, 0)T. Using the formula for the Riemannian distance of Equation (4.21) we have:

numbered Display Equation

which is the ratio of independent chi-squared random variables with 2k − 4 and 2 degrees of freedom, respectively. So,

numbered Display Equation

a scaled F distribution with 2k − 4 and 2 degrees of freedom. Hence, using the connection between the F and beta distributions (e.g. see Mardia et al. 1979, Appendix B),

numbered Display Equation

with density (k − 2)sk − 3. Transforming to ρ with Jacobian

numbered Display Equation

leads to the required density of Equation (11.6).

In Figure 11.1 we see how the shape of the density changes as the number of points k increases. The mode increases as the number of points increases.

Image described by surrounding text and caption.

Figure 11.1 The density (f) of the Riemannian distance ρ from any fixed shape for different numbers of points k, in the uniform case. The plot shows the densities for values of k = 3, 4, 6, 10, and the mode of the distribution increases as k increases.

This density was first communicated to us by David Kendall (1989, personal communication); see also Goodall and Mardia (1991) and Le (1991b).

If one uses Goodall and Mardia polar shape coordinates described at the end of Section 5.6.2, then the uniform shape distribution is given by the product of marginal density of ρ from Equation (11.6) and the uniform density of the spherical coordinates ϕ.

11.1.1.3 Anisotropic case

If the covariance matrix in Equation (11.2) is replaced by σ2diag(s, 1/s),  s > 0, the p.d.f. of shape (Kendall 1984), with respect to the uniform measure in the shape space, is:

(11.7) numbered Display Equation

where

(11.8) numbered Display Equation

with z+ = (1, u3 + iv3, …, uk + ivk)T and

(11.9) numbered Display Equation

is the Legendre polynomial of degree n (e.g. see Rainville 1960).

The maximum value of λ is (s + s− 1)/2, which has been called the ‘Broadbent factor’ by Kendall (1984). The model was motivated by the archaeological example of the standing stones of Land’s End described in Section 1.4.18, which was first studied by Broadbent (1980).

11.1.2 The isotropic case in two dimensions

An i.i.d. model (with the same mean for each point) is often not appropriate. For example, in many applications one would be interested in a model with different means at each landmark. The first extension that we consider is when the landmarks (Xj, Yj)T, j = 1, …, k, are independent isotropic bivariate normal, with different means but the same variance at each landmark. The isotropic normal model is:

where μ = (μ1, …, μk, ν1, …, νk)T. This model is particularly appropriate when the major source of variability is measurement error at each landmark. For example, consider Figure 11.2 where the points of a triangle are independently perturbed by isotropic normal errors. We wish to find the resulting perturbed shape distribution using say Bookstein or Kendall coordinates after translating, rotating and rescaling two points to a fixed baseline.

Image described by surrounding text and caption.

Figure 11.2 The isotropic model is appropriate for independent isotropically perturbed landmarks (a). The resulting shape distribution can be thought of as the distribution of the landmarks after translating, rotating and rescaling so that the baseline is sent to a fixed position (b).

Result 11.3 The offset normal shape density under the isotropic normal model of Equation (11.10), with respect to the uniform measure in the shape space, is:

where

numbered Display Equation

is a concentration parameter, S(μ) is the population centroid size, ρ is the Riemannian distance between the observed shape [X] and the population shape [μ], and the confluent hypergeometric function 1F1( · ) is defined in Equation (10.11).

Proof: Transform first to the Helmertized landmarks,

where L = I2H is a (2k − 2) × 2k block diagonal matrix with LLT = I2k − 2, μH = Lμ, and H is the Helmert submatrix of Equation (2.10). We partition into scale/rotation and shape by transforming to (W1 = XH2, W2 = YH2, UT)T, where U = (U3, …, Uk, V3, …, Vk)T are Kendall coordinates and W1 and W2 contain both rotation and scale information. Note that

j = 3, …, k and under the model P(W12 + W22 = 0) = 0. Defining the (2k − 2) vectors u+ and the orthogonal vector v+ by:

(11.14) numbered Display Equation

we can write the transformation from XH to (W1, W2, U) as:

numbered Display Equation

and the Jacobian of the inverse transformation is:

numbered Display Equation

The p.d.f. of (W1, W2, U) is therefore

numbered Display Equation

where

numbered Display Equation

In order to obtain the marginal shape distribution we integrate out W1 and W2, that is

numbered Display Equation

Keeping U fixed for the moment let us suppose that (W1, W2)|U has a bivariate normal distribution with mean

numbered Display Equation

and covariance matrix σ2{(u+)Tu+}I2. We write the density of this distribution as f*(w1, w2). It can be seen that

numbered Display Equation

where

numbered Display Equation

Now, from standard results of the normal distribution, if Wi, i = 1, 2, are independently normal with means bi and common variance ψ2, then

numbered Display Equation

where χ22(λ) is the non-central chi-squared distribution with two degrees of freedom and non-centrality parameter λ. Hence,

numbered Display Equation

Important point: Note that the integral reduces to calculation of the (k − 2)th moment of the non-central chi-squared distribution. The rth moment for the non-central χ22(λ) distribution is:

numbered Display Equation

Simple algebra shows that

numbered Display Equation

and so

numbered Display Equation

Using simple trigonometric identities we have:

numbered Display Equation

and hence the result is proved by recognizing the uniform measure of Equation (10.1).

This distribution was introduced by Mardia and Dryden (1989a,b). Kendall (1991b) and Le (1991b) verified the result using stochastic calculus and made connections with Brownian motion (see Section 14.4.3). The density is also described in Stuart and Ord (1994, p. 179), for the triangle case.

The distribution has 2k − 3 parameters. There are 2k − 4 parameters in the mean shape [μ] and a concentration parameter κ.

The confluent hypergeometric function is a finite series in the density since

(11.15) numbered Display Equation

where is the Laguerre polynomial of degree r; see Abramowitz and Stegun (1970, pp. 773–802). Using Kummer’s relation,

numbered Display Equation

we can also rewrite the density, with respect to the uniform measure, as:

numbered Display Equation

A convenient parameterization for the mean shape uses Kendall coordinates Θ = (θ3, …, θk, ϕ3, …, ϕk)T obtained from μ and a coefficient of variation τ2 = σ2/||μ2 − μ1||2. There are 2k − 4 shape parameters Θ and one variation parameter τ. The density can be written, with respect to the uniform measure, as:

where

with Θ+ = (1, θ3, …, θk, 0, ϕ3, …, ϕk)T, and ρ is the Riemannian distance between the shapes u and Θ given earlier in Equation (4.21).

11.1.3 The triangle case

The triangle case (k = 3) is particularly simple. The offset normal shape density is given by:

numbered Display Equation

and transforming to Kendall’s shape sphere using Equation (2.13) and Equation (2.14), the probability element of l = (lx, ly, lz)T, is:

numbered Display Equation

where ν is the direction on the shape sphere corresponding to the mean shape and dS is the probability element of . This result can be seen by using the same argument that led to Equation (10.8). Mardia (1989b) introduced this result and explored various properties of the distribution, including an approximation based on the Fisher–von Mises distribution.

Mardia (1980) gave the result for the reflection shape for triangles where the mean is equilateral, using two internal angles α1 and α2 to measure the reflection shape.

11.1.4 Approximations: Large and small variations

11.1.4.1 Large variations

As τ → ∞ in the shape density of Equation (11.16) (or equivalently as the concentration κ → 0) we clearly see that the shape distribution tends to the uniform distribution, since 1F1(2 − k; 1; x) → 1 as x → 0.

11.1.4.2 Small variations

Result 11.4 As τ → 0 it can be seen that

where means ‘converges in distribution to’ and Φ = ( − ϕ3, …, −ϕk, θ3, …, θk)T is orthogonal to the mean shape Θ = (θ3, …, θk, ϕ3, …, ϕk)T.

Proof: The result can be proved by using a Taylor series expansion of U about Θ. Specifically if we write XH = μ + ε where, without loss of generality, we have rescaled X by δ12 = {(μ2 − μ1)2 + (ν2 − ν1)2}1/2 and

numbered Display Equation

where τ = σ/δ12. The Taylor series expansion is:

numbered Display Equation

where Z = (ε2, …, εk − 1, ν2, …, νk − 1)T. Hence for small variations we consider up to first-order terms and so the result follows.

This result was first given by Bookstein (1984, 1986) using an approximation argument independent of the exact distribution. Bookstein suggested using normal models for shape variables for approximate inference, as described in Section 9.4.

Note that the approximate covariance in Equation (11.18) is not proportional to the identity matrix except for the k = 3 triangle case. Hence, correlations have been induced in general into the shape variables by the edge registration procedure.

If using Bookstein coordinates the approximate distribution, for small τ, is:

(11.19) numbered Display Equation

where D = I2⊗(HT1H1) and

numbered Display Equation

where H1 is the lower right (k − 2) × (k − 2) partition matrix of the Helmert submatrix (see Section 2.5), and ΘB = (θB3, …, θkB, ϕB3, …, ϕBk)T are the Bookstein coordinates of the mean μ, with ΦB = ( − ϕB3, …, −ϕBk, θB3, …, θBk)T.

Corollary 11.2 For k = 3 the approximate normal distribution of Bookstein coordinates is isotropic, and in particular

(11.20) numbered Display Equation

numbered Display Equation

where

and τ = σ/{(μ2 − μ1)2 + (ν2 − ν1)2}1/2.

Also, the form of the covariance leads to large variability if the mean location of the corresponding landmark is far away from the mid-point of the baseline.

Corollary 11.3 For k = 4 we have the following approximate covariances between the shape variables:

numbered Display Equation

Important point: We see that even though the original landmarks are independent, the shape variables become correlated when registering on a common edge. Hence, care should be taken with interpretation of PCs obtained from Bookstein coordinates, as discussed by Kent (1994).

It is immediately clear that the above covariances are invariant under similarity transformations of the mean configuration, that is transformations of the form

numbered Display Equation

as the terms in a and b cancel out in Equation (11.21).

Although we have given the covariance structure for k = 4, for larger k the covariances between any pair of landmarks j and l will be exactly the same as in Corollary NaN but with a suitable change of subscripts. So, given a mean configuration μj + iνjj = 1, …, k, we can write down the approximate covariances for Bookstein’s shape variables for any particular choice of baseline using the above corollaries.

11.1.5 Exact moments

Result 11.5 If using Kendall or Bookstein coordinates, then only the first moment exists. In particular, using Bookstein coordinates (Mardia and Dryden 1994)

See Mardia and Dryden (1994) for a proof of the result.

We see from Equation (11.22) that the expected value of UB is ‘pulled in’ towards the origin by the multiplicative factor 1 − exp { − 1/(4τ2)}.

Computing the exact expectation of the shape variables involves finding the expectation of a linear term in a random vector X over a quadratic form in X, that is aTX/XTBX. From Jones (1987) we have E[(lTX/XTBX)n] < ∞ if n < rank(B). Here rank(B) = 2 and hence the first moment is the only finite positive integer moment.

11.1.6 Isotropy

We now show that a general isotropic distribution for landmarks about a mean shape in two dimensions gives rise to an isotropic distribution in the tangent space to the mean so that standard PCA is valid in the tangent space, following Mardia (1995). This result is of great significance and it underlies the development of the Procrustes tangent space by Kent (1994), discussed in Section 7.7.

Result 11.6 If the original points have an isotropic covariance structure, then the partial tangent coordinates v are also isotropic.

Proof: Consider the original landmarks zo to be isotropic and take the Helmertized landmarks z = Hzo which will also be isotropic. The Helmertized landmarks z can be modelled by , where δ is isotropic and E(δ) = 0, so that the p.d.f. of δ is given by:

The partial Procrustes tangent projection is:

numbered Display Equation

Without any loss of generality, let us take ν = (1, 0, …, 0)T so that,

numbered Display Equation

where δ = (δ1, ΔT)T and λ = ||Δ||. Now, Δ|{δ1 = c} is isotropic since from Equation (11.23)

which is a function of λ2 only. Also, since a new function of Δ depending on λ2 (and λ is the radial component of Δ) is also isotropic given δ1, it follows from Equation (11.24) that

numbered Display Equation

conditional on δ1 is also isotropic and (vT1) = (0, eiθuT1), θ = Arg(1 + δ1) and E(uT1) = 0. Hence,

Furthermore, we can write

Hence

numbered Display Equation

so that Equation (11.25) and Equation (11.26) yield:

numbered Display Equation

The same argument applies for the PCA in real coordinates in the tangent space.

11.2 Offset normal shape distributions with general covariances

Shape distributions for the general multivariate normal model of Equation (11.1) with a general covariance structure in m = 2 dimensions have been studied by Dryden and Mardia (1991b). We write XN2k(μ, Ω) in this general case, where μ = (μ1, …, μk, ν1, …, νk)T and Ω is a 2k × 2k symmetric positive definite matrix. The offset normal shape density in the most general case involves finite sums of generalized Laguerre polynomials and is very complicated. The general shape distribution depends on 2k − 4 mean shape parameters Θ (the shape of μ) and a 2k − 2 × 2k − 2 matrix of covariance parameters Σ = LΩLT, where L = I2H and H is the Helmert submatrix. We write the p.d.f. of the Kendall shape variables in the general case as f(u; Θ, Σ), and the form of the density is given by Dryden and Mardia (1991b). Simplification of the density does occur when there is complex normal structure in the covariance matrix, which we now discuss.

11.2.1 The complex normal case

Let x = vec(X) have a complex multivariate normal distribution with covariance matrix

numbered Display Equation

where Ω1 is a k × k positive definite symmetric matrix and Ω2 is a k × k skew symmetric matrix [see Equation (10.3)]. The covariance matrix of the Helmertized landmarks vec(HX) is:

(11.27) numbered Display Equation

say, where ΣT2 = −Σ2, μH = Lμ, and L = I2H.

Result 11.7 In the complex normal case the shape density is given by:

(11.28) numbered Display Equation

where r is defined as a generalized Procrustes distance given by:

(11.29) numbered Display Equation

and κ = μTHΣ− 1μH/4.

The proof of this result was given by Dryden and Mardia (1991b). Le (1991b) has verified the complex normal shape distribution using stochastic calculus. The complex normal shape density is very similar in structure to the isotropic normal case, except that the Euclidean norm is replaced by a Mahalanobis norm. There are many useful models that have a complex normal covariance structure; such as the isotropic model of Section 11.1.2 and the cyclic Markov model (Dryden and Mardia 1991b).

11.2.2 General covariances: Small variations

Let us assume that not all the landmark means are equal, and consider the case where variations are small compared with the mean length of the baseline. We write the population shape as Θ = (θ3, …, θk, ϕ3, …, ϕk)T (i.e. the Kendall coordinates of {(μj, vj)T, j = 1, …, k}), and without loss of generality we translate, rotate and rescale the population so that the vectorized Helmertized coordinates XH = LX have mean μH = (1, θ3, …, θk, 0, ϕ3, …, ϕk)T, and general positive definite covariance matrix Σ = 2τ2LΩLT, where τ = σ/[(μ2 − μ1)2 + (ν2 − ν1)2]1/2.

Result 11.8 As τ → 0 +,

where

numbered Display Equation

and Ij, 0j are the j × j identity and zero matrices.

Proof: After transforming to the Helmertized coordinates as in Equation (11.12), we write XH = μH + ε+ where

and ε and η are (k − 2) × 1 column vectors. Note that, if τ is small, then each element of ε+ will be small. Transforming to Kendall coordinates, as in Equation (11.13), we have

numbered Display Equation

Thus,

numbered Display Equation

Thus, using Equation (11.31) and considering up to first-order terms [since we assume τ and hence (ε+)j are small], we have the result.

The covariance matrix of the normal approximation of Equation (11.30) depends on the mean Θ. For the isotropic case, with Ω = I2k, the covariance matrix of the normal approximation is given by:

numbered Display Equation

as seen in Equation (11.18). In the complex normal case the normal approximation is also of complex normal form.

11.3 Inference for offset normal distributions

11.3.1 General MLE

Consider using the general offset normal shape distribution to model population shapes. We could have a random sample of n independent observation shapes u1, …, un from the most general offset normal shape distribution for k ≥ 3, with population shape parameters Θ, and covariance parameters Σ. We have 2k2k − 3 parameters to estimate in total, although we expect only 2k2 − 5k + 2 to be identifiable, as there are 2k − 4 mean parameters and (2k − 4)(2k − 3)/2 parameters in the covariance matrix of a general 2k − 4 dimensional multivariate normal distribution. If we write the exact shape density as f(u; Θ, Σ), then the likelihood of the data is:

numbered Display Equation

and the log-likelihood is given by:

(11.32) numbered Display Equation

The MLE for shape and covariance is given by:

numbered Display Equation

We could proceed to maximize the likelihood with a numerical routine. For example, we could choose Newton–Raphson, Powell’s method, simulated annealing or a large variety of other techniques. Estimation can be problematic for completely general covariance structures (because singular matrices can lead to non-degenerate shape distributions), and so we proceed by considering simple covariance structures with a few parameters. For many simple practical cases there are no estimation problems. We now deal with such an application, where the covariance structure is assumed to be isotropic.

11.3.2 Isotropic case

Consider a random sample of n independent observations u1, …, un from the isotropic offset normal shape distribution of Equation (11.11). We work with parameters (ΘT, τ)T, where Θ is the mean shape and τ is the coefficient of variation. From Equation (11.16), the exact log-likelihood of (ΘT, τ)T, ignoring constant terms, is:

where ρi = ρ(ui, Θ) is the Riemannian distance between ui and the population mean shape Θ, given in Equation (11.17). Using the result (e.g. Abramowitz and Stegun 1970) that

numbered Display Equation

we can find the likelihood equations and solve using a numerical procedure. The root of the likelihood equations that provides a global maximum of Equation (11.33) is denoted by , the exact MLEs of (ΘT, τ)T, with τ > 0.

We can now write down likelihood ratio tests for various problems, for example testing for differences in shape between two independent populations or for changes in variation parameter. Large sample standard likelihood ratio tests can be carried out in the usual way, using Wilks’ theorem (Wilks 1962).

Consider, for example, testing whether , versus , where , with dim and dim. Let

numbered Display Equation

then according to Wilks’ theorem − 2log Λ ≈ χ2qp under H0, for large samples (under certain regularity conditions).

If we had chosen different shape coordinates for the mean shape, then inference would be identical as there is a one to one linear correspondence between Kendall and Bookstein coordinates as seen in Equation (2.12). Also, if a different baseline was chosen, then there is a one to one correspondence between Kendall’s coordinates and the alternative shape parameters. Therefore, any inference will not be dependent upon baseline or coordinate choice with this exact likelihood approach.

Example 11.1 Consider the schizophrenia data described in Section 1.4.5. The isotropic model may not be reasonable here but we will proceed with an illustrative example of the technique. So, we consider the shape variables in group g to be random samples from the isotropic offset normal distribution, with likelihood given by Equation (11.33) with 2k − 3 = 23 parameters in each group Θg, τg, where g = 1 (control group) or g = 2 (schizophrenia group). A hypothesis test for isotropy was considered in Example 9.7.

The isotropic offset normal MLEs were obtained using the R routine frechet which uses nlm for the maximization (see Section 11.3.3). Plots of the mean shapes in Bookstein coordinates using baseline genu (landmark 2) and splenium (landmark 1) are given in Figure 11.3. The exact isotropic MLE of shape is very close to the full Procrustes mean shape; in particular the Riemannian distance ρ from the exact isotropic MLE to the full Procrustes mean shapes in both groups are both extremely small. The MLE of the variation parameters are and .

Testing for equality in mean shape in each group we have − 2log Λ, distributed as χ222 under H0, as 43.269. Since the < ?TeXp? >-value for the test is P222 ≥ 43.269) = 0.005 we have strong evidence that the groups are different in mean shape, as is also seen in Example 9.7. Hence, there is evidence for a significant difference in mean shape between the controls and the schizophrenia patients. Note the large shape change in the triangle formed by the splenium and the two landmarks just below it. If we restrict the groups to have the same τ as well as the same mean shape in H0, then − 2log Λ = 44.9 and the < ?TeXp? >-value for the test is P223 ≥ 44.9) = 0.004.

Of course, we may have some concern that our sample sizes may not be large enough for the theory to hold and alternative F and Monte Carlo permutation tests (with similar findings) are given in Example 9.7.

Image described by surrounding text and caption.

Figure 11.3 The exact isotropic MLE mean shapes for the schizophrenia patients (S) and control group (C), pictured in Bookstein coordinates with baseline 2, 1 (x).

11.3.3 Exact isotropic MLE in R

The R code for the above example is as follows:

> anss<-frechet(schizophrenia$x[,,schizophrenia $group==”scz”],
      mean=”mle”)
> ansc<-frechet(schizophrenia$x[,,schizophrenia$group==”con”],
      mean=”mle”)
> anss$loglike
[1] 775.8128
> ansc$loglike
[1] 798.1256
> anss$tau
[1] 0.036485
> ansc$tau
[1] 0.03441964
> anss$kappa
[1] 1022.851
> ansc$kappa
[1] 1183.186
#pooled groups
> anspool<-frechet(schizophrenia$x,mean=”mle”)
> anspool$loglike
[1] 1551.484
# Wilks’ statistic
> 2*(ansc$loglike+anss$loglike-anspool$loglike)
[1] 44.90941

Hence, we have verified the estimated variance parameters, and the final likelihood ratio statistic.

11.3.4 EM algorithm and extensions

Kume and Welling (2010) consider a different approach for maximum likelihood estimation for the offset-normal shape distribution using an EM algorithm. The approach involves working with the original distribution of the landmark coordinates but treating the rotation and scale as missing/hidden variables. The approach is also useful for size-and-shape estimation. The methodology has been demonstrated to be effective in a variety of examples and it has also been extended to 3D shape and size-and-shape analysis (Kume et al. 2015).

A further extension of the method has been carried out by Huang et al. (2015) who use a mixture of offset-normal distributions for clustering with mixing proportions from a regression model, with application to corpus callosum shapes. Burl and Perona (1996) used the offset normal distribution for recognizing planar object classes, in particular for detecting faces from clutter via likelihood ratios.

11.4 Practical inference

In order to carry out practical inference one needs to decide upon a suitable procedure. The offset normal shape models are over-parameterized and rather awkward to work with. The isotropic offset normal model is reasonably straightforward to use, although the complex Watson or complex Bingham distribution are much easier to use for practical inference because they are members of the exponential family. So, if the isotropic model is appropriate, then in practice one would use the complex Watson distribution. However, the equal independent isotropic covariance structure is too simplistic for most datasets and so a next stage is to consider the complex covariance structure. The complex Bingham distribution is much easier to use than the offset complex normal distribution and so would be preferred.

There is a close analogy here with inference in directional data analysis. The offset normal shape distribution is analogous to the offset normal distribution in directional statistics, yet in practice one uses the von Mises–Fisher distribution for practical analysis. As a next stage in modelling one considers the real Bingham or Fisher–Bingham family of distributions, rather than more complicated offset normal distributions.

Important point: Perhaps the most straightforward and preferred way to proceed with inference for planar shape is to use the complex Bingham distribution, which gives the full Procrustes mean as the MLE of mean shape. In order to assess the structure of variability it is best to proceed with a multivariate analysis approach in the real tangent space, as seen in Chapter 9 and Section 12.4.

There are various extensions of offset normal shape distributions that can be considered, for example Alshabani et al. (2007a) consider partial shape distributions, where the landmark data are in three dimensions but the rotation invariance is only in one plane. These distributions are closely related to the planar shape distributions, and can be considered the joint shape distribution with marks at each landmark. The development of this work was motivated by the Bayesian analysis of human movement data (Alshabani et al. 2007b) from Section 1.4.13.

11.5 Offset normal size-and-shape distributions

Under the general normal model of Equation (11.1) we wish to investigate the joint distribution of size-and-shape – the offset normal size-and-shape distribution. These distributions have been developed by Dryden and Mardia (1992), and in general the results are complicated. Bookstein (2013b) states that in many applications the data really do live in Euclidean space but many models are in non-Euclidean shape spaces. One could argue that models should also be constructed in Euclidean spaces, where covariances between atoms for example make sense, and scale is not removed.

11.5.1 The isotropic case

11.5.1.1 The density

In the isotropic case we have Ω = σ2I2k and so Σ = σ2I2k − 2. It will be convenient to consider the centroid size S to measure size here. This model was used for first-order size-and-shape analysis by Bookstein (1986).

Corollary 11.4 The size-and-shape density of (S, U), with respect to the Lebesgue measure dSdU, for the isotropic case is:

where ζ = ||μH|| is the population centroid size and ρ is the Riemannian distance between the population and observed shapes given by:

numbered Display Equation

and f(u) is the uniform shape density in Kendall coordinates, from Equation (10.2).

A proof was given by Dryden and Mardia (1992) which involves a transformation and integrating out a rotation variable. The result was also obtained by Goodall and Mardia (1991). Let us write Θ for the Kendall coordinates obtained from μH. There is a total of 2k − 2 parameters here, namely the population size ζ, the population shape Θ (2k − 4 parameters) and the variation σ2 at landmarks.

11.5.1.2 Equal means

If all the means (μj, vj)T, j = 1, …, k, are equal under the isotropic model, that is the population configuration is coincident (ζ = 0), then the density of (S, U), with respect to the Lebesgue measure, is given by:

numbered Display Equation

Proof: Let ζ → 0 in the density of Equation (11.34).

Note that size (a scaled χ2k − 2 random variable) and shape are independent in this limiting case, as ζ → 0.

11.5.1.3 Small variations

Bookstein (1986) has shown that the centroid size and shape variables have approximately zero covariance for small variations under the isotropic normal model. We can also derive this result directly from the p.d.f. Let

numbered Display Equation

be the shape of the population configuration,

numbered Display Equation

and define

numbered Display Equation

with

numbered Display Equation

Result 11.9 If (σ/ζ) is small, then the approximate isotropic size-and-shape distribution is:

Proof: Transform to T = (S − ζ)/σ and W = ζ(U − Θ)/σ, and write γ = σ/ζ, where ζ > 0. The density of (T, W) is:

numbered Display Equation

Using the facts that, for large x,

numbered Display Equation

and that, for small γ,

numbered Display Equation

we see that

numbered Display Equation

Thus, since |B(Θ)|1/2 = (1 + ΘTΘ), we deduce that, as γ → 0 +,

numbered Display Equation

The result follows.

This result could also be obtained using Taylor series expansions for S and U, and Bookstein (1986) used such a procedure without reference to the offset normal size-and-shape distribution.

11.5.1.4 Conditional distributions

Using the standard result for the distribution of S under the isotropic model (e.g. Miller 1964, p. 28), it follows that the conditional density of U given S = s is:

numbered Display Equation

where f(u) is the uniform shape density of Equation (10.2). The conditional density of S given U involves the ratio of a modified Bessel function and a confluent hypergeometric function. Using Weber’s first exponential integral (Watson 1944, p. 393), it can be seen that

numbered Display Equation

If γ = σ/ζ is small, then, since for large x > 0 (Abramowitz and Stegun 1970, p. 504),

numbered Display Equation

and since cos 2ρ = 1 − O2), it follows that

numbered Display Equation

This verifies the result of Bookstein (1986) that S2 and U are approximately uncorrelated for small γ, which can also be seen from Equation (11.35).

11.5.2 Inference using the offset normal size-and-shape model

The size-and-shape distribution can be used for likelihood-based inference. Inference for models with general covariance structures will be very complicated. Placing some restrictions on the covariance matrix is necessary for the same identifiability reasons as in the shape case, discussed in Section 11.3.

Let us assume that we have a random sample (si, ui),  i = 1, …, n, from the isotropic size-and-shape distribution of Equation (11.34). We have the 2k − 2 parameters to estimate: population mean size ζ, population mean shape Θ (a 2k − 4-vector), and population variance σ2. The log likelihood is given, up to a constant with respect to the parameters, by:

where ρi is the Riemannian distance between the shapes ui and Θ given by Equation (4.21). We denote the MLEs by .

The likelihood equations must be solved numerically, although there is the following relationship between the MLEs of σ2 and ζ:

numbered Display Equation

The numerical maximization of Equation (11.36) requires the accurate evaluation of log I0(κ). This can be performed using a polynomial approximation such as mentioned by Abramowitz and Stegun (1970, p. 378).

Consider, for example, testing for symmetry in a sample of triangles. One aspect of the shape that could be investigated is asymmetry about the axis from the midpoint of the baseline to landmark 3.

Example 11.2 Consider a random sample of 30 T1 mouse vertebrae with k = 3 landmarks on each bone. Each triangle is formed by landmarks 1, 2 and 3 as in Figure 11.4.

Let us propose that the data form a random sample from the isotropic size-and-shape distribution. As a model check, since variations are small we would expect the Kendall coordinates and centroid size (U, V, S) to be approximately normal and uncorrelated from Equation (11.35). There is no reason to doubt normality using Shapiro–Wilk W tests, and the sample correlations for (U, V), (U, S) and (V, S) are only − 0.07, −0.06 and − 0.07, respectively. In addition, the standard deviations of U and V should be approximately equal and here the sample standard deviations of U and V are 0.023 and 0.034, respectively. Thus, we conclude that the model is fairly reasonable and there appears to be no linear relationship between size and shape in this particular dataset.

The MLEs are found by maximizing Equation (11.36). The values of the MLEs are and .

It is suggested that the population mean triangle could be isosceles with axis of symmetry θ = 0. We can investigate such asymmetry by testing H0: Θ = (0, ϕ)T versus H1: Θ = (θ, ϕ)T, Θ unrestricted, where ϕ, ζ > 0 and σ2 > 0 are not restricted under either hypothesis. Standard likelihood ratio tests can be performed in the usual way using Wilks’ theorem for large samples. Let

numbered Display Equation

Then according to Wilks’ theorem − 2log Λ ≃ χ21 under H0, for large samples. This test could of course be carried out using shape information alone, ignoring size. For this T1 mouse vertebrae example the statistic is − 2log Λ = 5.52 with approximate p-value 0.02 calculated from the χ21 approximation. Thus we have evidence to reject symmetry about θ = 0.

Image described by surrounding text and caption.

Figure 11.4 The three labelled landmarks taken on each T1 mouse vertebra. The baseline was taken as 1–2 in the calculations described in the text. Source: Dryden & Mardia 1992. Reproduced with permission of Oxford University Press.

11.6 Distributions for higher dimensions

11.6.1 Introduction

For practical data analysis in three and higher dimensions we recommend the use of Procrustes analysis and tangent space inference (Chapters 7 and 9). The study of suitable models for higher dimensional shape is much more difficult than in the planar case. We outline some distributional results for normally distributed configurations, although the work is complicated.

11.6.2 QR decomposition

Goodall and Mardia (1991, 1992, 1993) have considered higher dimensional offset normal size-and-shape and shape distributions by using the QR decomposition. In particular, the decomposition of Bartlett (1933) described below leads to the main results.

Although we have concentrated on km + 1 so far, in this section on higher dimensional distributions we also allow k < m + 1. Invariance with respect to rotation if k < m + 1 is the same as invariance with respect to the set of k − 1 orthonormal frames in m dimensions, which is the Stiefel manifold Vk − 1, m (e.g. see Jupp and Mardia 1989). The Stiefel manifold Vn, m of orthonormal n frames in m dimensions can be regarded as the set of (n × m) matrices A satisfying AAT = In. When k > m then Vn, m = Vm, m = O(m).

First of all we consider the general QR decomposition described in Section 5.6.2 in more detail, for general values of k and m. Consider the QR decomposition of the Helmertized coordinates Y = HX, namely

where n = min (k − 1, m), Vn, m is the Stiefel manifold and T((k − 1) × n) is lower triangular with non-negative diagonal elements Tii ≥ 0, i = 1, …, n. When k > m we have Γ inline O(m) and det(Γ) = ±1, and hence the QR decomposition removes orientation and reflection in this case.

Before in Section 5.6.2 we just considered k > m but restricted Γ so that reflections were not allowed. In that case we removed orientation only for k > m and so we had Γ inline SO(m), det(Γ) = +1 and T unrestricted. However, when km there is no distinction to make between allowing reflections or not – the shape is invariant under reflection in this case.

For k > m, T, {Tij : 1 ≤ i ≤ j ≤m} is the size-and-shape if Γ inline SO(m) and the reflection size-and-shape if Γ inline O(m). If km, then the distinction is irrelevant and we just call T the size-and-shape.

To remove scale we divide by the Euclidean norm of T,

numbered Display Equation

For k > m, W is the shape of our configuration if Γ inline SO(m) and reflection shape if Γ inline O(m). If km the distinction is again irrelevant and we call W the shape. If ||T|| = 0, then the shape is not defined.

11.6.3 Size-and-shape distributions

11.6.3.1 Coincident means

If X has the multivariate isotropic normal model

then YYT = HXXTHT = TΓΓTTT = TTT has a central Wishart distribution if rank(μ) = 0, where μ = E[Y] = Hvec− 1m(μ*), and m > k − 1. Bartlett’s decomposition gives the distribution of the lower triangular matrix T, where YYT = TTT. Hence T is the reflection size-and-shape in the QR decomposition of Y in Equation (11.37).

Result 11.10 (Bartlett decomposition). Under the normal model of Equation (11.38) with μ = 0 the distribution of (T)ij is:

for i > j, i = 1, …, k − 1,  j = 1, …, N, and all these variables are mutually independent {N = min (k − 1, m)}.

The proof involves random rotation matrices. For example, for k − 1 < m see Kshirsagar (1963). Goodall and Mardia (1991) have shown that this result holds also for m < k (even when the Wishart matrix is singular) and thus Equation (11.39) is the reflection size-and-shape distribution for the coincident means (μ = 0) case. If the size-and-shape distribution without reflections is required, then the result of Equation (11.39) holds, except that TmmN(0, σ2) for k > m.

Goodall and Mardia (1991) derive further results for planar and higher rank means. The densities are complicated in general. One simple case is that of triangles in higher dimensions.

If k = 3 and m ≥ 3, then we can consider Kendall’s spherical coordinates on the hemisphere . In this case we have W11 = (1 + sin θcos ϕ)1/2/√2, W22 = cos θ(1 + sin θcos ϕ)− 1/2/√2 and J(u) = sin θ(1 + sin θcos ϕ)− 1/2/(2√2). Hence the shape density on is given by,

The uniform measure on the hemisphere is sin θ/(2π)dθdϕ and θ/2 is the great circle distance to the north pole (the equilateral triangle shape). This result was derived by Kendall (1988) and indicates that the higher the dimension m, the greater the strength of the mode at the equilateral triangle.

If m = 2 then we divide Equation (11.40) by 2 to obtain the shape density without reflection on the sphere S2(1/2),

numbered Display Equation

which is the uniform density on .

Diffusions on planar shape spaces were studied by Ball et al. (2008) and the explicit expressions for Brownian motion were given. Also Ornstein–Uhlenbeck processes in planar shape space were defined using Goodall–Mardia coordinates. A drift term was added in the Ornstein–Uhlenbeck process along the geodesic towards the mean shape. Inference for these models was given by Ball et al. (2006) who discussed exact retrospective sampling of Ornstein–Uhlenbeck processes, with a particular application to modelling cell shape.

Other types of diffusions include Radon shape diffusions, where the shape of random projections of shapes into subspaces of lower dimension is considered (Panaretos 2006, 2008). In related work Panaretos (2009) investigates the recovery of a density function from its Radon transform at random and unknown projection angles, and a practical application is a random tomography technique for obtaining sparse reconstructions of proteins from noisy random projections using the LASSO (Panaretos and Konis 2011).

11.6.4 Multivariate approach

An alternative way to proceed is to consider an approach using results from multivariate analysis. The QR decomposition is again considered Y = TΓ, but one transforms to the joint distribution of (T, Γ) and then integrates out Γ over the Stiefel manifold to obtain the size-and-shape distribution. Integrating out size then gives the marginal shape distribution.

The key result for integrating out the rotation (and reflections with km + 1) is that, from James (1964):

numbered Display Equation

where [ΓdΓT] is the invariant unnormalized measure on O(m) and 0F1( · ; ·) is a hypergeometric function with matrix argument, which can be written as a sum of zonal polynomials. If rank(μ) < m, then it follows that

numbered Display Equation

and so the density without reflection invariance is fairly straightforward to find in this case. If rank(μ) = m, for m ≥ 3 then the density without reflection invariance cannot be computed in this way.

Goodall and Mardia (1992) discuss in detail the multivariate approach for computing shape densities from normal distributions. The case where the covariance matrix is Im⊗Σ is dealt with and size-and-shape and shape distributions are given for all cases except rank(μ) = m, for m ≥ 3.

In conclusion, marginal shape distributions from normal models are very complicated in higher than two dimensions, but for low rank means (rank ≤ 2) the distributions are tractable.

11.6.5 Approximations

When variations are small Goodall and Mardia (1993) have given some normal approximations to the size-and-shape and shape distributions.

Result 11.11 Let TP be the Procrustes size-and-shape coordinates after rotating the size-and-shape T to the mean μ (diagonal and of rank m) given in Equation (7.14). For small σ, it follows that (TP)ij follow independent normal distributions with means μij and variances σ2, for i = j or i > m, and σ2/(1 + μ2iijj2) for j < im.

Important point: The normal approximation is much simpler to use for inference than the offset normal distribution, and Goodall and Mardia (1993) consider an m = 3 dimensional application of size-and-shape analysis in the manufacture of chairs. A decomposition of the approximate distribution for size and shape is also given, and it can be shown for general m that size and shape are asymptotically independent (as σ → 0) under isotropy. The 2D (m = 2) case was seen earlier in Result 11.9. Thus the result gives a method for small σ for shape analysis. Inference will be similar to using the tangent approximation inference of Section 9.1.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset