10
Shape and size-and-shape distributions

Of fundamental interest are probability distributions in shape spaces and pre-shape spaces, which provide models for statistical shape analysis. We shall also consider the joint distribution of size and shape later, but first we concentrate on shape. Since the shape space is non-Euclidean, special care is required. There are several approaches we could consider for stochastic modelling, for example we could specify a probability distribution in:

  1. the configuration space
  2. the pre-shape space
  3. the shape space (directly)
  4. a tangent space.

If the probability distribution also includes certain transformation variables (as in 1. and 2.) which are not of interest, then we could either integrate them out (a marginal approach) or condition on them (a conditional approach). Note that 2. is an ambient space model whereas 3. is a quotient space model (see Section 3.2.1).

The work in this chapter is primarily for m = 2 dimensional landmarks, and some extensions to higher dimensions will be considered in Section 11.6. The main inference procedure in this chapter is the method of maximum likelihood.

10.1 The uniform distribution

We gave the volume measure in pre-shape space in Equation (4.24) and the volume measure in shape space in Equation (4.26), using Kent’s polar coordinates of Section 4.3.3. We normalize the volume measure in shape space to give the uniform measure dγ in shape space. Given original landmarks zo recall that Kent’s polar coordinates on the pre-shape sphere are obtained from the complex pre-shape z = (z1, …, zk − 1)T = Hzo/||Hzo|| by:

numbered Display Equation

for j = 1, …, k − 1, sj ≥ 0, 0 ≤ θj < 2π. Kent’s polar coordinates on shape space are sj, ϕj = (θj − θk − 1)mod(2π), j = 1, …, k − 2.

Result 10.1 The uniform shape measure in Kent’s polar coordinates is:

numbered Display Equation

which has the property that ∫dγ = 1.

Result 10.2 Transforming to Kendall coordinates UK = (UK3, …, UkK, VK3, …, VKk)T we have the uniform measure on the shape space given by:

(10.1) numbered Display Equation

where

(10.2) numbered Display Equation

Proof: Consider the coordinate system (Kendall 1984) where rj > 0 and 0 ≤ ϕj < 2π, j = 1, …, k − 2. On substituting we have:

numbered Display Equation

and ϕj = (θj − θk − 1)mod2π, j = 1, …, k − 2. We see that

numbered Display Equation

and the Jacobian J = |∂si/∂rj| is given by:

numbered Display Equation

Hence the uniform measure is given by:

numbered Display Equation

Furthermore, using the coordinate system , j = 1, …, k − 2, the uniform measure is:

numbered Display Equation

By suitably permuting the labels of the original pre-shape coordinates z1, …, zk − 1 we arrive at the expression for the uniform density in terms of Kendall coordinates UKj + iVjK = zj − 1/z1, j = 3, …, k.

When using Kent’s polar coordinates the density is constant and hence Kent’s coordinate system is a natural way to represent the uniform shape distribution.

We shall see in Section 11.1.1 that if the original landmarks are i.i.d. with a rotationally symmetric distribution, then the resulting marginal shape distribution is uniform in the shape space. Mardia (1980) gave the uniform shape distribution for triangles using angles to measure shape [see Equation (11.5)].

10.2 Complex Bingham distribution

10.2.1 The density

We consider the case where we have a probability distribution on the pre-shape sphere Sm(k − 1) − 1, corresponding to k points in m dimensions. For the m = 2 dimensional case and using complex notation we have seen in Section 4.3 that , where is the unit complex sphere in k − 1 complex dimensions. In our particular application to shape analysis consider the k original landmarks in m = 2 dimensions with complex coordinates written as zo (k × 1 complex vector). Pre-multiply by the (k − 1) × k Helmert submatrix H of Equation (2.10) to give the k − 1 Helmertized complex landmarks zH [a (k − 1)-vector]. Rescaling by ||zH|| it follows that the pre-shape is:

numbered Display Equation

Definition 10.1 The complex Bingham distribution on , denoted , has probability density function (p.d.f.)

numbered Display Equation

where z* represents the complex conjugate of the transpose of z, the matrix A is (k − 1) × (k − 1) Hermitian (i.e. A = A*), and c(A) is the normalizing constant.

The complex Bingham distribution is analogous to the real Bingham distribution (e.g. see Mardia et al. 1979, p. 433) and in fact, it is a particular case (see Section 10.2.5). The distribution has the property that

numbered Display Equation

and is thus invariant under rotations of the pre-shape z. So, if an object is rotated, then it has the same density and will contribute in an identical way to inference as the same object in the original rotation. This property therefore makes the distribution appropriate for shape analysis (location and scale were previously removed because z is on the pre-shape sphere). The complex Bingham distribution provides a very elegant framework for the analysis of 2D shape data. The distribution was proposed by Kent (1994) and the properties that we describe are from his paper.

Since z*z = 1 for , we can see that the parameter matrices A and A + αI define the same complex Bingham distribution, with c(A + αI) = c(A)exp α, where α is a complex number. It is convenient to remove this non-identifiability by setting λmax (A) = 0, where λmax (A) denotes the largest eigenvalue of A.

10.2.2 Relation to the complex normal distribution

In order to understand the covariance structure of a complex distribution we describe the complex multivariate normal distribution. If zj = xj + iyj have a joint complex normal distribution with means ξj = μj + iνj, j = 1, …, p, and a p × p Hermitian covariance matrix Σ = Σ1 + iΣ2, then writing x = (x1, …, xp, y1, …, yp)T and μ = (μ1, …, μp, ν1, …, νp)T we have

where Σ2 = −ΣT2 is skew-symmetric and Σ1 is symmetric positive definite. In particular, var(xj) = var(yj) and cov(xj, yj) = 0, and so for each j the covariance structure is isotropic. Writing z = (z1, …, zp)T and ξ = (ξ1, …, ξp)T the density of the complex normal distribution is:

(10.4) numbered Display Equation

and we write ; see, for example, Goodman (1963) and Andersen et al. (1995).

The complex Bingham distribution can be obtained through conditioning a zero mean complex multivariate normal distribution to have unit norm. In particular, if then,

numbered Display Equation

So, an interpretation of the Hermitian matrix − A is that of the inverse covariance matrix of a zero mean complex normal random variable, which is conditioned to have unit norm to give the complex Bingham distribution. Hence, the complex Bingham distribution is an example of a conditional approach.

10.2.3 Relation to real Bingham distribution

The (k − 2) dimensional complex Bingham distribution can be regarded as a special case of a (2k − 3) dimensional real Bingham distribution as in the following. If the jth element of z is (z)j = xj + iyj, define a (2k − 2) dimensional vector u = (x1, y1, …, xk − 1, yk − 1)T, say, by splitting each complex number into its real and imaginary parts. Also, if A = (ahj) has entries ahj = αhjexp (iθhj) with θjh = −θhj, − π < θjh ≤ π, define a (2k − 2) × (2k − 2) matrix B made up of (k − 1)2 blocks of size (2 × 2) given by:

numbered Display Equation

Then z*Az = uTBu so that a complex Bingham distribution for z is equivalent to a real Bingham distribution for u (Kent 1994).

10.2.4 The normalizing constant

Result 10.3 (Kent 1994) The normalizing constant of the complex Bingham distribution is:

(10.5) numbered Display Equation

where λ1 < λ2 < ⋅⋅⋅ < λk − 1 = 0 denote the eigenvalues of A. Note that c(A) = c(Λ) depends only on the eigenvalues of A, with Λ = diag(λ1, …, λk − 1).

The result and proof were given by Kent (1994). If equalities exist between some λjs then c(A) can be integrated using repeated applications of L’Hôpital’s rule. For example, for λk − 2 = λk − 1 but all other λjs distinct, we get (Bingham et al. 1992):

numbered Display Equation

A particularly useful and simple case is the complex Watson distribution when there are just two distinct eigenvalues, as described in Section 10.3.

10.2.5 Properties

We give various properties of the complex Bingham distribution. Let γ1, …, γk − 1 denote the standardized eigenvectors of A so that γ*jγj = 1, γ*iγj = 0, ij, and Aγj = λjγj, j = 1, …, k − 1. Each γj is defined only up to rotation by a unit complex scalar. Provided that λk − 2 < 0, z = γk − 1 maximizes the density and γk − 1 is unique up to a scalar rotation exp (iθ)γk − 1. Furthermore, if λ1, …, λk − 2 are far below 0, the distribution becomes highly concentrated about this modal axis.

10.2.5.1 High concentrations

To model high concentration, replace A by κA transforming to Kent’s polar coordinates and integrating out θ1, …, θk − 1

numbered Display Equation

As κ → ∞, uj = κsj, j = 1, …, k − 2, tend to independent exponential random variables with densities

numbered Display Equation

Thus for large κ,

We assume that there is a unique largest eigenvalue λk − 1 = 0 > λk − 2. Another way to look at the high concentration about the axis γk − 1 is to rotate z to γk − 1 and then to project z onto the tangent plane at γk − 1. Given ,

numbered Display Equation

and it can be proved that θ and v are independently distributed with θ uniform on [0, 2π). Writing A = ∑k − 1j = 1λjγjγ*j = ∑k − 2j = 1λjγjγ*j, we have

numbered Display Equation

and therefore v is asymptotically complex multivariate normal in k − 1 complex dimensions with zero mean and covariance matrix

numbered Display Equation

where the last expression is the Moore–Penrose generalized inverse of − A (see Definition 9.1). Here Σ is singular because v lies in the tangent plane γ*k − 1v = 0. Thus z is approximately distributed as so that Σ is determined by the eigenvectors orthogonal to γk − 1.

The asymptotic distribution shows that the real and imaginary parts of vi are jointly distributed as N2(0, I2σ2i). Hence the complex Bingham distribution imposes an isotropic distribution on the marginal distributions. However, it does allow some intercorrelation between landmarks, that is that of a complex normal covariance matrix as seen in Equation (10.3).

10.2.5.2 Low concentrations

As λj → 0 for all j, the complex Bingham distribution tends to the uniform distribution on . Note that c(0) = 2πk − 1/(k − 2)!

10.2.5.3 High dimensions

As the number of landmarks k increases one can study the asymptotic distribution of the complex Bingham distribution. Dryden (2005b) considers statistical analysis on high-dimensional spheres and shape spaces, and in particular the high-dimensional complex Bingham distribution tends to a multivariate normal distribution for fixed rank parameter matrix A.

10.2.6 Inference

Let z1, …, zn be a random sample from a population modelled by a complex Bingham distribution, nk − 1. Set

numbered Display Equation

to be the (k − 1) × (k − 1) complex sum of squares and products matrix. Suppose that the eigenvalues of S are positive and distinct, 0 < l1 < ⋅⋅⋅ < lk − 1 and let g1, …, gk − 1 denote the corresponding eigenvectors. Note that ∑jlj = n. Maximum likelihood estimation is carried out as for the real Bingham distribution.

Result 10.4 (Kent, 1994) Under the complex Bingham distribution the MLE of γ1, …, γk − 1 and Λ = diag1, …, λk − 1) are given by:

numbered Display Equation

and the solution to

numbered Display Equation

Under high concentrations

numbered Display Equation

Proof: The log-likelihood for the data reduces to:

(10.7) numbered Display Equation

Holding λ1 < … < λk − 2 < λk − 1 = 0 constant, it can be seen that

numbered Display Equation

and so

numbered Display Equation

The MLEs of the eigenvalues are found by solving

numbered Display Equation

Under high concentration, from Equation (10.6),

numbered Display Equation

giving ,    j = 1, …, k − 2.

The dominant eigenvector can be regarded as the average axis of the data: an estimate of modal shape. Hence, the average axis from the complex Bingham MLE is the same as the full Procrustes estimator of mean shape for m = 2 dimensional data, which was also given by the dominant eigenvector of S as shown in Result 8.2.

Example 10.1 Consider the mouse vertebral data described in Section 1.4.1. There are k = 6 landmarks in m = 2 dimensions here. Using the complex Bingham distribution the MLE of the population modal shape is:

numbered Display Equation

corresponding to the eigenvalue lk − 1 = 22.9. Since lk − 1n the data are highly concentrated here. In Figure 8.5 we saw a plot of the MLE of the modal shape (which is the full Procrustes mean) as a centred pre-shape.

In order to examine the structure of variability we could examine g1, …, gk − 2 although the restrictions of a complex covariance structure are in force. In particular, we necessarily have isotropic perturbations at each landmark. So, a practical method for inference would be to examine the shape variability in the tangent space; details of this example were given in Example 8.2.

10.2.7 Approximations and computation

In order to carry out more accurate estimation of the complex Bingham parameters Kume and Wood (2007) study the derivatives of the normalizing constant of the Bingham distribution, and Kume and Wood (2005) consider highly accurate saddlepoint approximations for the Bingham normalizing constant (as well as the Fisher–Bingham). Their methodology enables more accurate maximum likelihood estimation for a whole range of parameter values, including when the data are quite dispersed.

10.2.8 Relationship with the Fisher–von Mises distribution

Result 10.5 For triangles (k = 3) modelling with the complex Bingham in pre-shape space is equivalent to using the Fisher–von Mises distribution on the spherical shape space .

Proof: If k = 3, then there is just one non-zero eigenvalue of A. Hence we can re-parameterize A as:

numbered Display Equation

So, we have

numbered Display Equation

where ρ( · ) is the Riemannian distance. We can obtain the density in shape space by using Kent’s polar shape coordinates and integrating over the independent uniform θ2 component (Mardia 1996b). So, the density in shape space is proportional to:

numbered Display Equation

where HTz is the centred icon corresponding to z. Since the shape space can be regarded as a sphere with great circle metric ρ, it follows that, using the direction vector l = (lx, ly, lz)T, the probability element of l is:

where ν = (νx, νy, νz)T is the modal direction and dS is the probability element. The density (10.8) is the density of the Fisher–von Mises distribution on .

Mardia (1989b) provided motivation to use the Fisher–von Mises distribution for shape analysis of triangles.

Example 10.2 Consider the microfossil data described in Section 1.4.16. Using the complex Bingham distribution on the pre-shape sphere will be equivalent to using the Fisher–von Mises distribution on . Let li = (lxi, lyi, lzi), i = 1, …, n, be the directions on corresponding to each triangle shape, and we assume that l1, …, ln are a random sample from the Fisher–von Mises distribution. From the data we find that

numbered Display Equation

Thus the resultant mean length is The mean directional vector is (0.146, 0.076, 0.472) and the resulting estimate of κ from the Fisher–von Mises distribution is , so we have a large value of the concentration here. When mapped to Bookstein coordinates the mean shape estimate is uB = −0.101 and vB = 0.632.

10.2.9 Simulation

There are several different methods available for simulation from the complex Bingham distribution, and Kent et al. (2004) provide several algorithms, which have rather different efficiencies depending on the amount of concentration. A detailed description of simulation methods for a variety of directional distributions is given by Kent et al. (2013). Also, Kume and Walker (2009) discuss simulation for the Fisher–Bingham distribution.

10.3 Complex Watson distribution

10.3.1 The density

Definition 10.2 The complex Watson distribution on the pre-shape sphere , denoted by , has p.d.f.

(10.9) numbered Display Equation

Properties of the distribution were discussed by Mardia and Dryden (1999). In shape analysis we are primarily interested in ξ ≥ 0, although the concentration parameter ξ can be negative in some circumstances. For ξ > 0, the modal pre-shape direction is μeiθ, where 0 ≤ θ < 2π is an arbitrary rotation angle. The complex Watson distribution is an important special case of the complex Bingham distribution, when there are just two distinct eigenvalues in A (a single distinct largest eigenvalue and all other eigenvalues being equal). In this case all directions orthogonal to the modal axis have equal weight, and so this model implicitly assumes a spherical error distribution, as would be obtained from independent isotropic landmarks with equal variances. In this case the complex Bingham distribution can be re-parameterized so that A = −2κ(Ik − 1 − μμ*), where μ is the modal vector on the pre-shape sphere. The density on the pre-shape sphere can be written as:

(10.10) numbered Display Equation

where ρ(HTz, HTμ) is the Procrustes distance between the shapes of HTz and HTμ. Note that HTz is the centred icon corresponding to the pre-shape z. The parameters of the distribution are the population mode pre-shape direction μ and the concentration parameter ξ = 2κ. We have chosen to re-parameterize the complex Watson distribution with the parameter κ = ξ/2 in order to make connections with the offset normal distributions below in Section 11.1.

We call this distribution the complex Watson distribution due to the similar form of the density to the Dimroth–Watson distribution on the real sphere, see Watson (1983, 1965), who called it the Scheiddegger–Watson distribution.

For triangles (k = 3) the complex Watson distribution is the same as the complex Bingham distribution, as there are at most two distinct eigenvalues in that case. An alternative expression for the density is

numbered Display Equation

where dF( · ) is the full Procrustes distance of Equation (4.10).

This distribution was briefly discussed by Dryden (1991), Kent (1994) and Prentice and Mardia (1995). Full details and further developments can be found in Mardia and Dryden (1999), on which subsequent details are based.

Result 10.6 The integrating constant for the complex Watson distribution is:

numbered Display Equation

where

(10.11) numbered Display Equation

is the confluent hypergeometric function (e.g. see Abramowitz and Stegun 1970).

See Mardia and Dryden (1999) for a proof. Note that the integrating constant has a relatively simple form, and so inference is quite straightforward.

10.3.2 Inference

Let z1, …, zn be a random sample from a population modelled by a complex Watson distribution, with nk − 1. Set

numbered Display Equation

to be the (k − 1) × (k − 1) complex sum of squares and products matrix. Suppose that the eigenvalues of S are positive with eigenvalues lk − 1 > lk − 2 ≥ … ≥ l1 > 0 and let gk − 1, …, g1 denote the corresponding eigenvectors.

Result 10.7 Under the complex Watson distribution the MLEs of μ and κ are given by:

numbered Display Equation

and the solution to

numbered Display Equation

where Under high concentrations

(10.12) numbered Display Equation

Proof: The log-likelihood for the data reduces to:

numbered Display Equation

Holding κ constant, it can be seen that provides the maximum, where α is an arbitrary rotation angle (0 ≤ α < 2π), and so

numbered Display Equation

The MLE of κ is found by solving

numbered Display Equation

Under high concentration

numbered Display Equation

and so

numbered Display Equation

Therefore

numbered Display Equation

and so the high concentration approximate MLE for κ follows.

As for the complex Bingham distribution the dominant eigenvector can be regarded as the average axis of the data (up to a rotation), and it has the same shape as the sample full Procrustes mean of Equation (6.11).

Since the complex Watson distribution is a complex Bingham distribution with all but the largest eigenvalue equal, then under the complex Watson distribution we should expect l1, …, lk − 2 to be approximately equal. In other words, the shape variability for the complex Watson distribution is isotropic and all principal components of shape variability have equal weightings. This observation provides us with a model checking procedure.

10.3.3 Large concentrations

We now consider the case of large κ.

Result 10.8 If κ is large, then the distribution tends to a complex multivariate normal distribution in the tangent space with mean 0 and covariance matrix , which is a generalized inverse of 2κ(Ik − 1 − μμ*).

Proof: We show that the following expression is true:

where z*z = μ*μ = 1. Furthermore, A = A where A = I − μμ*, and A is a generalized inverse. First we prove the generalized inverse, that is AAA = A. It is easily verified that A is idempotent and therefore A3 = A. Hence, a generalized inverse of A is also A. To prove Equation (10.13) expand the left-hand side:

numbered Display Equation

Since z*z = 1, we have

numbered Display Equation

Furthermore, (I − μμ*)μ = μ − μ(μ*μ) = μ − μ = 0. Hence Equation (10.13) is proved. Thus, the exponential term in the complex Watson density reduces to:

numbered Display Equation

where v are the tangent plane coordinates, and z is close to μ. Hence,

numbered Display Equation

for large concentrations.

For large concentrations the distribution is also very similar to the offset isotropic normal distribution described in Section 11.1.2. It follows from the normal approximation that for large κ

numbered Display Equation

is approximately distributed as χ22k − 4, since the complex rank of (I − μμ*) is k − 2. Furthermore, we have

numbered Display Equation

where ρ is the Riemannian distance between the shapes of HTz and HTμ, so that

(10.14) numbered Display Equation

This result agrees with the approximate distribution for the squared full Procrustes distance under isotropic normal errors given in Equation (9.6), with

numbered Display Equation

where S(HTμ) is the centroid size of the icon HTμ corresponding to μ.

Important point: The central role that the Fisher–von Mises distribution plays in directional data analysis is played by the complex Watson distribution for 2D shape analysis. For triangles we have seen that the two distributions are equivalent. For more than three points there are also similarities. Both distributions are very tractable, and each can be viewed as a conditional normal distribution.

10.4 Complex angular central Gaussian distribution

Kent (1994, 1997) suggested the complex angular central Gaussian distribution for shape analysis, which has the density

The parameter matrix Σ is a positive definite Hermitian matrix. The density in Equation (10.15) is invariant under rescaling Σ → cΣ, c > 0. Also, f(z) = f(eiθz) for all θ, and so the distribution is appropriate for shape analysis. Kent (1997) gives an expectation--maximization (EM) algorithm for computing the MLE of Σ, and derives various properties. In particular, the distribution is more resistant to outliers than the complex Bingham or Watson distributions.

10.5 Complex Bingham quartic distribution

We have seen that complex distributions can be of great benefit for planar shape analysis, but the complex structure has been restrictive imposing isotropy at each landmark. An extension which has approximate elliptical structure at each landmark is the complex Bingham quartic distribution (Kent et al. 2006) which includes an extra quartic term in the exponent of the density. The density is:

numbered Display Equation

and Kent et al. (2006) carefully described the identifiable parameters. Note that if the distribution is concentrated the approximate distribution in tangent space is a multivariate normal distribution with general elliptic covariance structure. For k landmarks in two dimensions there are 2k − 4 mean parameters and (2k − 4)(2k − 3)/2 covariance parameters that are identifiable.

10.6 A rotationally symmetric shape family

A rotationally symmetric family of shape distributions can be obtained with densities as functions of shape distance to a mean configuration. The class of densities is given by Mardia and Dryden (1999):

numbered Display Equation

where ϕ( · ) ≥ 0 is a suitable penalty function, and it is assumed that ϕ(dF) is an increasing function of dF and ϕ(0) = 0. This distribution is more difficult to work with in m > 2 dimensions, as the integrating constant then depends on the mean.

A particular subclass is given by the densities with

which give the same MLE as a class of shape estimators proposed by Kent (1992) and discussed in Section 6.3. The estimators become more resistant to outliers as h increases.

A shape distribution can be obtained from a pre-shape distribution by transforming to Kent’s polar shape coordinates and dropping the independent uniform rotation parameter θk − 1. For example, the shape distribution obtained from the complex Watson distribution is a member of the family with h = 1 in Equation (10.16). The density in this form was introduced by Dryden (1991) and is given, with respect to the uniform measure, by:

where ρ is the Procrustes distance from the observed shape to the mean shape. The MLE of the mean shape is the same as the full Procrustes estimator of the mean shape. If k = 3, then the density reduces to:

numbered Display Equation

which is the Fisher–von Mises distribution on the shape sphere.

An alternative distribution with is the distribution with density

where the mean shape is the same as the partial Procrustes mean (Dryden 1991). We call this distribution the partial Procrustes shape distribution. The integrating constant is given by:

numbered Display Equation

where Iν( · ) and Lν( · ) are the modified Bessel function of the first kind and the modified Struve function. If k = 3, then c2(κ) = 8κ2/[1 − exp (4κ) + 4κexp (4κ)].

Both of the distributions of Equation (10.17) and Equation (10.18) are asymptotically normal as κ → ∞ and uniform if κ = 0. The density in Equation (10.18) has particularly light tails and so the MLE of μ under the partial Procrustes distribution will be more affected by outliers than that of the complex Watson shape distribution of Equation (10.17).

10.7 Other distributions

Another family of shape distributions for m = 2 was given by Micheas et al. (2006) who considered complex elliptical distributions, and related work of Micheas and Dey (2005) for modelling shape distributions and inference for assessing differences in shapes.

Pennec (2006) described an intrinsic Gaussian distribution for symmetric Riemannian manifolds, defined as the distribution with fixed mean and covariance matrix Σ which maximizes entropy. The curvature of the manifold appears in the concentration matrix, and for small variations the concentration matrix is approximately

numbered Display Equation

where Ric is the Ricci curvature tensor of Equation (4.15). Also, see Pennec (1996, 1999) for earlier, related work.

10.8 Bayesian inference

As an alternative to maximum likelihood estimation we could consider a Bayesian approach to inference. Let u1, …, un be a random sample of shapes, and let Θ be the location shape parameters and Σ the shape variability parameters. After specifying a prior distribution for (Θ, Σ) with density π(Θ, Σ), the posterior density is given by:

numbered Display Equation

where L(u1, …, un; Θ, Σ) is the likelihood. We could then use the posterior mean of the parameters or the posterior mode [also known as the maximum a posteriori (MAP) estimator].

A Bayesian estimator of mean shape is the posterior mean shape

numbered Display Equation

and the posterior mean shape variance is:

numbered Display Equation

Another point estimator is the MAP estimator given by:

numbered Display Equation

that is the posterior mode.

A 100(1 − α)% credibility region for shape and covariance is given by A where

numbered Display Equation

In some situations the posterior density is very complicated and inference can then be carried out using Markov chain Monte Carlo (MCMC) simulation (e.g. Smith and Roberts 1993; Besag et al. 1995; Gilks et al. 1996). If the prior density for (Θ, Σ) is uniform, then the MAP estimator is identical to the maximum likelihood estimator, and this is the main approach that we have considered in this chapter for ease of exposition.

The Bayesian approach has been particularly useful for unlabelled shape analysis, for example in matching molecules (Green and Mardia, 2006; Dryden et al. 2007), where the correspondence between landmarks needs to be modelled. A comparison of approaches is given by Kenobi and Dryden (2012), and we give more detail in Chapter 14.

The choice of prior for the shape Θ (or pre-shape) is quite straightforward, for example we could choose a complex Watson or a complex Bingham prior. One particular case of interest is a prior which encourages smooth outlines in the case of close neighbouring landmarks and this is discussed in the next example.

More care needs to be taken with a prior distribution for Σ, although Wishart or inverse-Wishart distributions are obvious choices. If we were considering a simple model such as the complex Watson, then a suitable prior distribution for the single concentration parameter κ is a gamma distribution.

Example 10.3 Consider modelling a random sample of pre-shapes z1, …, zn with a complex Watson distribution with mode μ and concentration parameter κ. The concentration parameter κ is assumed to be known. Let the prior distribution of μ be complex Bingham with known parameter matrix A. So, the posterior density of μ is given by:

numbered Display Equation

where S = ∑ni = 1ziz*i is the complex sum of squares and products matrix. Hence, we have a conjugate prior here, because the posterior is also a complex Bingham distribution, but with parameter matrix κS + A. The posterior mode μMAP is the dominant eigenvector of κS + A.

If we have a series of landmarks on outlines of objects, then a sutiable prior might encourage smoothness of the outline. It would be convenient to label the landmarks sequentially around the outline, so that landmarks i and j are neighbours if |(ij)modk| = 1. We include the modulo k term so that landmarks 1 and k are considered neighbours. In particular, in this periodic case A = −FFT, where

numbered Display Equation

The choice of smoothing parameter λ is clearly of vital importance, as illustrated in the following example. We shall consider a complex Bingham prior with parameter matrix λA, and data distributed as complex Watson with parameter κ. Hence in this case the posterior distribution is also a complex Bingham with parameter matrix κS + λA, and the MAP estimate is the principal eigenvector of this matrix.

Example 10.4 Consider the Small group of T2 mouse vertebrae in the dataset of Section 1.4.1. There are k = 6 landmarks on n = 23 bones. We re-order the landmarks by swapping landmarks 1 and 2, so that the landmarks follow round anti-clockwise. In Figure 10.1 we see various smoothed estimates of the mean shape given by the posterior mode, which we write as μMAP. This example is used for illustration only to see what would happen if the smoothing parameter is taken to be too large. In practice this method is most useful when a large number of noisy landmarks are available around an outline. In Figure 10.1a we see the Procrustes mean (λ/κ = 0). In Figure 10.1b we see little difference in the MAP estimate (λ/κ = 0.1) but in Figure 10.1c (λ/κ = 1) there is a visible difference, with landmarks being brought in more to a discrete circle. We also consider what happens when a much too large smoothing parameter is used. In Figure 10.1d with a very large value of λ/κ = 100 the mean is very nearly a discrete circle. For such large λ/κ the integrated square derivative matrix dominates, which has the trigonometric eigenvectors for a circulant matrix (the eigenvectors form the discrete Fourier transform matrix). An automatic choice of λ/κ could be obtained by cross-validation in a similar manner to that described in Section 12.3.3.

Image described by surrounding text and caption.

Figure 10.1 The smoothed Procrustes mean of the T2 Small data: (a) λ/κ = 0; (b) λ/κ = 0.1; (c) λ/κ = 1.0; and (d) λ/κ = 100.

There may be occasions when smoothed PCA is appropriate, particularly for outline data. After a smoothed mean μMAP has been obtained, the figures are transformed to μMAP by Procrustes analysis leading to the sample covariance matrix S. The first smoothed PC is obtained by finding the dominant eigenvector of κS + λA and practical details are very similar to the procedure of Rice and Silverman (1991). The first few scores from a smoothed PCA of the spine lines were used by Mardia et al. (1994) to summarize the shape information. Although it was a small pilot study it did appear that the twin spine shapes showed higher familial correlation than those of non-twin siblings. Mardia et al. (1996a) and Dryden et al. (2008b) also give some summary measures for measuring spinal shape.

Many authors have considered Bayesian shape analysis, especially for unlabelled shape as discussed in Chapter 14. Some other examples include Brignell (2007) and Mardia et al. (2013b) using Markov chain Monte Carlo simulation; Leu and Damien (2014) provide a detailed discussion of Bayesian shape analysis of the complex Bingham distribution; Micheas and Peng (2010) discuss a Bayesian version of Procrustes analysis with applications to hydrology; and Theobald and Mardia (2011) provide a framework for Bayesian analysis of the generalized non-isotropic Procrustes problem with scaling.

10.9 Size-and-shape distributions

10.9.1 Rotationally symmetric size-and-shape family

We could specify size-and-shape distributions directly in size-and-shape space SΣkm. A suitable candidate could be that based on Riemannian distance, with density proportional to:

numbered Display Equation

where dS( · ) was given in Equation (5.5) and ϕ( · ) ≥ 0 is a suitable penalty function. In the case ϕ(x) = x2 the MLE of size-and-shape will be the same as those using least squares partial Procrustes analysis, provided the integrating constant does not depend on μ, that is for the planar case. As for the shape case in Section 10.6 we can choose different functions for estimators with different properties. Further size-and-shape distributions based on normally distributed data are discussed in Section 11.5.

10.9.2 Central complex Gaussian distribution

For 2D data we could exploit the complex representation and, in an analogous manner to using the complex Bingham distribution in pre-shape space for shape analysis, we could specify a distribution for the Helmertized landmarks which is invariant under rotations. A suitable candidate is the central complex Gaussian distribution,

numbered Display Equation

where Σ is Hermitian. The distribution of zH is identical to that of eiθzH for any arbitrary rotation θ. The mode of the distribution is given by the dominant eigenvector of Σ, hence the MLE of size-and-shape under this model is the dominant eigenvector of

numbered Display Equation

given a random sample of Helmertized observations zH1, …, zHn.

10.10 Size-and-shape versus shape

If both size and shape are available, then we have two strategies in our analysis: either analyse shape and size separately (shape in shape space Σkm and size on the positive real line) or analyse size and shape jointly in size-and-shape space SΣkm. Which method is preferable depends on the aim of the study. If we are really interested in shape alone, then one would expect that more robust inference will be carried out using the first approach, even if the joint size-and-shape information is available, as less modelling assumptions need to be made. However, in many applications in biology one is interested in investigating joint size and shape information or in using size as a covariate, and then the size-and-shape space plays an important rôle.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset