Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

10
Shape and size-and-shape distributions

Of fundamental interest are probability distributions in shape spaces and pre-shape spaces, which provide models for statistical shape analysis. We shall also consider the joint distribution of size and shape later, but first we concentrate on shape. Since the shape space is non-Euclidean, special care is required. There are several approaches we could consider for stochastic modelling, for example we could specify a probability distribution in:

the configuration space
the pre-shape space
the shape space (directly)
a tangent space.

If the probability distribution also includes certain transformation variables (as in 1. and 2.) which are not of interest, then we could either integrate them out (a marginal approach) or condition on them (a conditional approach). Note that 2. is an ambient space model whereas 3. is a quotient space model (see Section 3.2.1).

The work in this chapter is primarily for m = 2 dimensional landmarks, and some extensions to higher dimensions will be considered in Section 11.6. The main inference procedure in this chapter is the method of maximum likelihood.

10.1 The uniform distribution

We gave the volume measure in pre-shape space in Equation (4.24) and the volume measure in shape space in Equation (4.26), using Kent’s polar coordinates of Section 4.3.3. We normalize the volume measure in shape space to give the uniform measure dγ in shape space. Given original landmarks z^o recall that Kent’s polar coordinates on the pre-shape sphere are obtained from the complex pre-shape z = (z₁, …, z_{k − 1})^T = Hz^o/||Hz^o|| by:

for j = 1, …, k − 1, s_j ≥ 0, 0 ≤ θ_j < 2π. Kent’s polar coordinates on shape space are s_j, ϕ_j = (θ_j − θ_{k − 1})mod(2π), j = 1, …, k − 2.

Result 10.1 The uniform shape measure in Kent’s polar coordinates is:

which has the property that ∫dγ = 1.

Result 10.2 Transforming to Kendall coordinates U^K = (U^K₃, …, U_k^K, V^K₃, …, V^K_k)^T we have the uniform measure on the shape space given by:

(10.1)

where

(10.2)

Proof: Consider the coordinate system (Kendall 1984) where r_j > 0 and 0 ≤ ϕ_j < 2π, j = 1, …, k − 2. On substituting we have:

and ϕ_j = (θ_j − θ_{k − 1})mod2π, j = 1, …, k − 2. We see that

and the Jacobian J = |∂s_i/∂r_j| is given by:

Hence the uniform measure is given by:

Furthermore, using the coordinate system , j = 1, …, k − 2, the uniform measure is:

By suitably permuting the labels of the original pre-shape coordinates z₁, …, z_{k − 1} we arrive at the expression for the uniform density in terms of Kendall coordinates U^K_j + iV_j^K = z_{j − 1}/z₁, j = 3, …, k.

When using Kent’s polar coordinates the density is constant and hence Kent’s coordinate system is a natural way to represent the uniform shape distribution.

We shall see in Section 11.1.1 that if the original landmarks are i.i.d. with a rotationally symmetric distribution, then the resulting marginal shape distribution is uniform in the shape space. Mardia (1980) gave the uniform shape distribution for triangles using angles to measure shape [see Equation (11.5)].

10.2 Complex Bingham distribution

10.2.1 The density

We consider the case where we have a probability distribution on the pre-shape sphere S^{m(k − 1) − 1}, corresponding to k points in m dimensions. For the m = 2 dimensional case and using complex notation we have seen in Section 4.3 that , where is the unit complex sphere in k − 1 complex dimensions. In our particular application to shape analysis consider the k original landmarks in m = 2 dimensions with complex coordinates written as z^o (k × 1 complex vector). Pre-multiply by the (k − 1) × k Helmert submatrix H of Equation (2.10) to give the k − 1 Helmertized complex landmarks z_H [a (k − 1)-vector]. Rescaling by ||z_H|| it follows that the pre-shape is:

Definition 10.1 The complex Bingham distribution on , denoted , has probability density function (p.d.f.)

where z* represents the complex conjugate of the transpose of z, the matrix A is (k − 1) × (k − 1) Hermitian (i.e. A = A*), and c(A) is the normalizing constant.

The complex Bingham distribution is analogous to the real Bingham distribution (e.g. see Mardia et al. 1979, p. 433) and in fact, it is a particular case (see Section 10.2.5). The distribution has the property that

and is thus invariant under rotations of the pre-shape z. So, if an object is rotated, then it has the same density and will contribute in an identical way to inference as the same object in the original rotation. This property therefore makes the distribution appropriate for shape analysis (location and scale were previously removed because z is on the pre-shape sphere). The complex Bingham distribution provides a very elegant framework for the analysis of 2D shape data. The distribution was proposed by Kent (1994) and the properties that we describe are from his paper.

Since z*z = 1 for , we can see that the parameter matrices A and A + αI define the same complex Bingham distribution, with c(A + αI) = c(A)exp α, where α is a complex number. It is convenient to remove this non-identifiability by setting λ_max(A) = 0, where λ_max(A) denotes the largest eigenvalue of A.

10.2.2 Relation to the complex normal distribution

In order to understand the covariance structure of a complex distribution we describe the complex multivariate normal distribution. If z_j = x_j + iy_j have a joint complex normal distribution with means ξ_j = μ_j + iν_j, j = 1, …, p, and a p × p Hermitian covariance matrix Σ = Σ₁ + iΣ₂, then writing x = (x₁, …, x_p, y₁, …, y_p)^T and μ = (μ₁, …, μ_p, ν₁, …, ν_p)^T we have

(10.3)

where Σ₂ = −Σ^T₂ is skew-symmetric and Σ₁ is symmetric positive definite. In particular, var(x_j) = var(y_j) and cov(x_j, y_j) = 0, and so for each j the covariance structure is isotropic. Writing z = (z₁, …, z_p)^T and ξ = (ξ₁, …, ξ_p)^T the density of the complex normal distribution is:

(10.4)

and we write ; see, for example, Goodman (1963) and Andersen et al. (1995).

The complex Bingham distribution can be obtained through conditioning a zero mean complex multivariate normal distribution to have unit norm. In particular, if then,

So, an interpretation of the Hermitian matrix − A is that of the inverse covariance matrix of a zero mean complex normal random variable, which is conditioned to have unit norm to give the complex Bingham distribution. Hence, the complex Bingham distribution is an example of a conditional approach.

10.2.3 Relation to real Bingham distribution

The (k − 2) dimensional complex Bingham distribution can be regarded as a special case of a (2k − 3) dimensional real Bingham distribution as in the following. If the jth element of z is (z)_j = x_j + iy_j, define a (2k − 2) dimensional vector u = (x₁, y₁, …, x_{k − 1}, y_{k − 1})^T, say, by splitting each complex number into its real and imaginary parts. Also, if A = (a_hj) has entries a_hj = α_hjexp (iθ_hj) with θ_jh = −θ_hj, − π < θ_jh ≤ π, define a (2k − 2) × (2k − 2) matrix B made up of (k − 1)² blocks of size (2 × 2) given by:

Then z*Az = u^TBu so that a complex Bingham distribution for z is equivalent to a real Bingham distribution for u (Kent 1994).

10.2.4 The normalizing constant

Result 10.3 (Kent 1994) The normalizing constant of the complex Bingham distribution is:

(10.5)

where λ₁ < λ₂ < ⋅⋅⋅ < λ_{k − 1} = 0 denote the eigenvalues of A. Note that c(A) = c(Λ) depends only on the eigenvalues of A, with Λ = diag(λ₁, …, λ_{k − 1}).

The result and proof were given by Kent (1994). If equalities exist between some λ_js then c(A) can be integrated using repeated applications of L’Hôpital’s rule. For example, for λ_{k − 2} = λ_{k − 1} but all other λ_js distinct, we get (Bingham et al. 1992):

A particularly useful and simple case is the complex Watson distribution when there are just two distinct eigenvalues, as described in Section 10.3.

10.2.5 Properties

We give various properties of the complex Bingham distribution. Let γ₁, …, γ_{k − 1} denote the standardized eigenvectors of A so that γ*_jγ_j = 1, γ*_iγ_j = 0, i ≠ j, and Aγ_j = λ_jγ_j, j = 1, …, k − 1. Each γ_j is defined only up to rotation by a unit complex scalar. Provided that λ_{k − 2} < 0, z = γ_{k − 1} maximizes the density and γ_{k − 1} is unique up to a scalar rotation exp (iθ)γ_{k − 1}. Furthermore, if λ₁, …, λ_{k − 2} are far below 0, the distribution becomes highly concentrated about this modal axis.

10.2.5.1 High concentrations

To model high concentration, replace A by κA transforming to Kent’s polar coordinates and integrating out θ₁, …, θ_{k − 1}

As κ → ∞, u_j = κs_j, j = 1, …, k − 2, tend to independent exponential random variables with densities

Thus for large κ,

(10.6)

We assume that there is a unique largest eigenvalue λ_{k − 1} = 0 > λ_{k − 2}. Another way to look at the high concentration about the axis γ_{k − 1} is to rotate z to γ_{k − 1} and then to project z onto the tangent plane at γ_{k − 1}. Given ,

and it can be proved that θ and v are independently distributed with θ uniform on [0, 2π). Writing A = ∑^{k − 1}_{j = 1}λ_jγ_jγ*_j = ∑^{k − 2}_{j = 1}λ_jγ_jγ*_j, we have

and therefore v is asymptotically complex multivariate normal in k − 1 complex dimensions with zero mean and covariance matrix

where the last expression is the Moore–Penrose generalized inverse of − A (see Definition 9.1). Here Σ is singular because v lies in the tangent plane γ*_{k − 1}v = 0. Thus z is approximately distributed as so that Σ is determined by the eigenvectors orthogonal to γ_{k − 1}.

The asymptotic distribution shows that the real and imaginary parts of v_i are jointly distributed as N₂(0, I₂σ²_i). Hence the complex Bingham distribution imposes an isotropic distribution on the marginal distributions. However, it does allow some intercorrelation between landmarks, that is that of a complex normal covariance matrix as seen in Equation (10.3).

10.2.5.2 Low concentrations

As λ_j → 0 for all j, the complex Bingham distribution tends to the uniform distribution on . Note that c(0) = 2π^{k − 1}/(k − 2)!

10.2.5.3 High dimensions

As the number of landmarks k increases one can study the asymptotic distribution of the complex Bingham distribution. Dryden (2005b) considers statistical analysis on high-dimensional spheres and shape spaces, and in particular the high-dimensional complex Bingham distribution tends to a multivariate normal distribution for fixed rank parameter matrix A.

10.2.6 Inference

Let z₁, …, z_n be a random sample from a population modelled by a complex Bingham distribution, n ≥ k − 1. Set

to be the (k − 1) × (k − 1) complex sum of squares and products matrix. Suppose that the eigenvalues of S are positive and distinct, 0 < l₁ < ⋅⋅⋅ < l_{k − 1} and let g₁, …, g_{k − 1} denote the corresponding eigenvectors. Note that ∑_jl_j = n. Maximum likelihood estimation is carried out as for the real Bingham distribution.

Result 10.4 (Kent, 1994) Under the complex Bingham distribution the MLE of γ₁, …, γ_{k − 1} and Λ = diag(λ₁, …, λ_{k − 1}) are given by:

and the solution to

Under high concentrations

Proof: The log-likelihood for the data reduces to:

(10.7)

Holding λ₁ < … < λ_{k − 2} < λ_{k − 1} = 0 constant, it can be seen that

and so

The MLEs of the eigenvalues are found by solving

Under high concentration, from Equation (10.6),

giving , j = 1, …, k − 2.

The dominant eigenvector can be regarded as the average axis of the data: an estimate of modal shape. Hence, the average axis from the complex Bingham MLE is the same as the full Procrustes estimator of mean shape for m = 2 dimensional data, which was also given by the dominant eigenvector of S as shown in Result 8.2.

Example 10.1 Consider the mouse vertebral data described in Section 1.4.1. There are k = 6 landmarks in m = 2 dimensions here. Using the complex Bingham distribution the MLE of the population modal shape is:

corresponding to the eigenvalue l_{k − 1} = 22.9. Since l_{k − 1} ≈ n the data are highly concentrated here. In Figure 8.5 we saw a plot of the MLE of the modal shape (which is the full Procrustes mean) as a centred pre-shape.

In order to examine the structure of variability we could examine g₁, …, g_{k − 2} although the restrictions of a complex covariance structure are in force. In particular, we necessarily have isotropic perturbations at each landmark. So, a practical method for inference would be to examine the shape variability in the tangent space; details of this example were given in Example 8.2.

10.2.7 Approximations and computation

In order to carry out more accurate estimation of the complex Bingham parameters Kume and Wood (2007) study the derivatives of the normalizing constant of the Bingham distribution, and Kume and Wood (2005) consider highly accurate saddlepoint approximations for the Bingham normalizing constant (as well as the Fisher–Bingham). Their methodology enables more accurate maximum likelihood estimation for a whole range of parameter values, including when the data are quite dispersed.

10.2.8 Relationship with the Fisher–von Mises distribution

Result 10.5 For triangles (k = 3) modelling with the complex Bingham in pre-shape space is equivalent to using the Fisher–von Mises distribution on the spherical shape space .

Proof: If k = 3, then there is just one non-zero eigenvalue of A. Hence we can re-parameterize A as:

So, we have

where ρ( · ) is the Riemannian distance. We can obtain the density in shape space by using Kent’s polar shape coordinates and integrating over the independent uniform θ₂ component (Mardia 1996b). So, the density in shape space is proportional to:

where H^Tz is the centred icon corresponding to z. Since the shape space can be regarded as a sphere with great circle metric ρ, it follows that, using the direction vector l = (l_x, l_y, l_z)^T, the probability element of l is:

(10.8)

where ν = (ν_x, ν_y, ν_z)^T is the modal direction and dS is the probability element. The density (10.8) is the density of the Fisher–von Mises distribution on .

Mardia (1989b) provided motivation to use the Fisher–von Mises distribution for shape analysis of triangles.

Example 10.2 Consider the microfossil data described in Section 1.4.16. Using the complex Bingham distribution on the pre-shape sphere will be equivalent to using the Fisher–von Mises distribution on . Let l_i = (l_xi, l_yi, l_zi), i = 1, …, n, be the directions on corresponding to each triangle shape, and we assume that l₁, …, l_n are a random sample from the Fisher–von Mises distribution. From the data we find that

Thus the resultant mean length is The mean directional vector is (0.146, 0.076, 0.472) and the resulting estimate of κ from the Fisher–von Mises distribution is , so we have a large value of the concentration here. When mapped to Bookstein coordinates the mean shape estimate is u^B = −0.101 and v^B = 0.632.

10.2.9 Simulation

There are several different methods available for simulation from the complex Bingham distribution, and Kent et al. (2004) provide several algorithms, which have rather different efficiencies depending on the amount of concentration. A detailed description of simulation methods for a variety of directional distributions is given by Kent et al. (2013). Also, Kume and Walker (2009) discuss simulation for the Fisher–Bingham distribution.

10.3 Complex Watson distribution

10.3.1 The density

Definition 10.2 The complex Watson distribution on the pre-shape sphere , denoted by , has p.d.f.

(10.9)

Properties of the distribution were discussed by Mardia and Dryden (1999). In shape analysis we are primarily interested in ξ ≥ 0, although the concentration parameter ξ can be negative in some circumstances. For ξ > 0, the modal pre-shape direction is μe^iθ, where 0 ≤ θ < 2π is an arbitrary rotation angle. The complex Watson distribution is an important special case of the complex Bingham distribution, when there are just two distinct eigenvalues in A (a single distinct largest eigenvalue and all other eigenvalues being equal). In this case all directions orthogonal to the modal axis have equal weight, and so this model implicitly assumes a spherical error distribution, as would be obtained from independent isotropic landmarks with equal variances. In this case the complex Bingham distribution can be re-parameterized so that A = −2κ(I_{k − 1} − μμ*), where μ is the modal vector on the pre-shape sphere. The density on the pre-shape sphere can be written as:

(10.10)

where ρ(H^Tz, H^Tμ) is the Procrustes distance between the shapes of H^Tz and H^Tμ. Note that H^Tz is the centred icon corresponding to the pre-shape z. The parameters of the distribution are the population mode pre-shape direction μ and the concentration parameter ξ = 2κ. We have chosen to re-parameterize the complex Watson distribution with the parameter κ = ξ/2 in order to make connections with the offset normal distributions below in Section 11.1.

We call this distribution the complex Watson distribution due to the similar form of the density to the Dimroth–Watson distribution on the real sphere, see Watson (1983, 1965), who called it the Scheiddegger–Watson distribution.

For triangles (k = 3) the complex Watson distribution is the same as the complex Bingham distribution, as there are at most two distinct eigenvalues in that case. An alternative expression for the density is

where d_F( · ) is the full Procrustes distance of Equation (4.10).

This distribution was briefly discussed by Dryden (1991), Kent (1994) and Prentice and Mardia (1995). Full details and further developments can be found in Mardia and Dryden (1999), on which subsequent details are based.

Result 10.6 The integrating constant for the complex Watson distribution is:

where

(10.11)

is the confluent hypergeometric function (e.g. see Abramowitz and Stegun 1970).

See Mardia and Dryden (1999) for a proof. Note that the integrating constant has a relatively simple form, and so inference is quite straightforward.

10.3.2 Inference

Let z₁, …, z_n be a random sample from a population modelled by a complex Watson distribution, with n ≥ k − 1. Set

to be the (k − 1) × (k − 1) complex sum of squares and products matrix. Suppose that the eigenvalues of S are positive with eigenvalues l_{k − 1} > l_{k − 2} ≥ … ≥ l₁ > 0 and let g_{k − 1}, …, g₁ denote the corresponding eigenvectors.

Result 10.7 Under the complex Watson distribution the MLEs of μ and κ are given by:

and the solution to

where Under high concentrations

(10.12)

Proof: The log-likelihood for the data reduces to:

Holding κ constant, it can be seen that provides the maximum, where α is an arbitrary rotation angle (0 ≤ α < 2π), and so

The MLE of κ is found by solving

Under high concentration

and so

Therefore

and so the high concentration approximate MLE for κ follows.

As for the complex Bingham distribution the dominant eigenvector can be regarded as the average axis of the data (up to a rotation), and it has the same shape as the sample full Procrustes mean of Equation (6.11).

Since the complex Watson distribution is a complex Bingham distribution with all but the largest eigenvalue equal, then under the complex Watson distribution we should expect l₁, …, l_{k − 2} to be approximately equal. In other words, the shape variability for the complex Watson distribution is isotropic and all principal components of shape variability have equal weightings. This observation provides us with a model checking procedure.

10.3.3 Large concentrations

We now consider the case of large κ.

Result 10.8 If κ is large, then the distribution tends to a complex multivariate normal distribution in the tangent space with mean 0 and covariance matrix , which is a generalized inverse of 2κ(I_{k − 1} − μμ*).

Proof: We show that the following expression is true:

(10.13)

where z*z = μ*μ = 1. Furthermore, A⁻ = A where A = I − μμ*, and A⁻ is a generalized inverse. First we prove the generalized inverse, that is AA⁻A = A. It is easily verified that A is idempotent and therefore A³ = A. Hence, a generalized inverse of A is also A. To prove Equation (10.13) expand the left-hand side:

Since z*z = 1, we have

Furthermore, (I − μμ*)μ = μ − μ(μ*μ) = μ − μ = 0. Hence Equation (10.13) is proved. Thus, the exponential term in the complex Watson density reduces to:

where v are the tangent plane coordinates, and z is close to μ. Hence,

for large concentrations.

For large concentrations the distribution is also very similar to the offset isotropic normal distribution described in Section 11.1.2. It follows from the normal approximation that for large κ

is approximately distributed as χ²_{2k − 4}, since the complex rank of (I − μμ*) is k − 2. Furthermore, we have

where ρ is the Riemannian distance between the shapes of H^Tz and H^Tμ, so that

(10.14)

This result agrees with the approximate distribution for the squared full Procrustes distance under isotropic normal errors given in Equation (9.6), with

where S(H^Tμ) is the centroid size of the icon H^Tμ corresponding to μ.

Important point: The central role that the Fisher–von Mises distribution plays in directional data analysis is played by the complex Watson distribution for 2D shape analysis. For triangles we have seen that the two distributions are equivalent. For more than three points there are also similarities. Both distributions are very tractable, and each can be viewed as a conditional normal distribution.

10.4 Complex angular central Gaussian distribution

Kent (1994, 1997) suggested the complex angular central Gaussian distribution for shape analysis, which has the density

(10.15)

The parameter matrix Σ is a positive definite Hermitian matrix. The density in Equation (10.15) is invariant under rescaling Σ → cΣ, c > 0. Also, f(z) = f(e^iθz) for all θ, and so the distribution is appropriate for shape analysis. Kent (1997) gives an expectation--maximization (EM) algorithm for computing the MLE of Σ, and derives various properties. In particular, the distribution is more resistant to outliers than the complex Bingham or Watson distributions.

10.5 Complex Bingham quartic distribution

We have seen that complex distributions can be of great benefit for planar shape analysis, but the complex structure has been restrictive imposing isotropy at each landmark. An extension which has approximate elliptical structure at each landmark is the complex Bingham quartic distribution (Kent et al. 2006) which includes an extra quartic term in the exponent of the density. The density is:

and Kent et al. (2006) carefully described the identifiable parameters. Note that if the distribution is concentrated the approximate distribution in tangent space is a multivariate normal distribution with general elliptic covariance structure. For k landmarks in two dimensions there are 2k − 4 mean parameters and (2k − 4)(2k − 3)/2 covariance parameters that are identifiable.

10.6 A rotationally symmetric shape family

A rotationally symmetric family of shape distributions can be obtained with densities as functions of shape distance to a mean configuration. The class of densities is given by Mardia and Dryden (1999):

where ϕ( · ) ≥ 0 is a suitable penalty function, and it is assumed that ϕ(d_F) is an increasing function of d_F and ϕ(0) = 0. This distribution is more difficult to work with in m > 2 dimensions, as the integrating constant then depends on the mean.

A particular subclass is given by the densities with

(10.16)

which give the same MLE as a class of shape estimators proposed by Kent (1992) and discussed in Section 6.3. The estimators become more resistant to outliers as h increases.

A shape distribution can be obtained from a pre-shape distribution by transforming to Kent’s polar shape coordinates and dropping the independent uniform rotation parameter θ_{k − 1}. For example, the shape distribution obtained from the complex Watson distribution is a member of the family with h = 1 in Equation (10.16). The density in this form was introduced by Dryden (1991) and is given, with respect to the uniform measure, by:

(10.17)

where ρ is the Procrustes distance from the observed shape to the mean shape. The MLE of the mean shape is the same as the full Procrustes estimator of the mean shape. If k = 3, then the density reduces to:

which is the Fisher–von Mises distribution on the shape sphere.

An alternative distribution with is the distribution with density

(10.18)

where the mean shape is the same as the partial Procrustes mean (Dryden 1991). We call this distribution the partial Procrustes shape distribution. The integrating constant is given by:

where I_ν( · ) and L_ν( · ) are the modified Bessel function of the first kind and the modified Struve function. If k = 3, then c₂(κ) = 8κ²/[1 − exp (4κ) + 4κexp (4κ)].

Both of the distributions of Equation (10.17) and Equation (10.18) are asymptotically normal as κ → ∞ and uniform if κ = 0. The density in Equation (10.18) has particularly light tails and so the MLE of μ under the partial Procrustes distribution will be more affected by outliers than that of the complex Watson shape distribution of Equation (10.17).

10.7 Other distributions

Another family of shape distributions for m = 2 was given by Micheas et al. (2006) who considered complex elliptical distributions, and related work of Micheas and Dey (2005) for modelling shape distributions and inference for assessing differences in shapes.

Pennec (2006) described an intrinsic Gaussian distribution for symmetric Riemannian manifolds, defined as the distribution with fixed mean and covariance matrix Σ which maximizes entropy. The curvature of the manifold appears in the concentration matrix, and for small variations the concentration matrix is approximately

where Ric is the Ricci curvature tensor of Equation (4.15). Also, see Pennec (1996, 1999) for earlier, related work.

10.8 Bayesian inference

As an alternative to maximum likelihood estimation we could consider a Bayesian approach to inference. Let u₁, …, u_n be a random sample of shapes, and let Θ be the location shape parameters and Σ the shape variability parameters. After specifying a prior distribution for (Θ, Σ) with density π(Θ, Σ), the posterior density is given by:

where L(u₁, …, u_n; Θ, Σ) is the likelihood. We could then use the posterior mean of the parameters or the posterior mode [also known as the maximum a posteriori (MAP) estimator].

A Bayesian estimator of mean shape is the posterior mean shape

and the posterior mean shape variance is:

Another point estimator is the MAP estimator given by:

that is the posterior mode.

A 100(1 − α)% credibility region for shape and covariance is given by A where

In some situations the posterior density is very complicated and inference can then be carried out using Markov chain Monte Carlo (MCMC) simulation (e.g. Smith and Roberts 1993; Besag et al. 1995; Gilks et al. 1996). If the prior density for (Θ, Σ) is uniform, then the MAP estimator is identical to the maximum likelihood estimator, and this is the main approach that we have considered in this chapter for ease of exposition.

The Bayesian approach has been particularly useful for unlabelled shape analysis, for example in matching molecules (Green and Mardia, 2006; Dryden et al. 2007), where the correspondence between landmarks needs to be modelled. A comparison of approaches is given by Kenobi and Dryden (2012), and we give more detail in Chapter 14.

The choice of prior for the shape Θ (or pre-shape) is quite straightforward, for example we could choose a complex Watson or a complex Bingham prior. One particular case of interest is a prior which encourages smooth outlines in the case of close neighbouring landmarks and this is discussed in the next example.

More care needs to be taken with a prior distribution for Σ, although Wishart or inverse-Wishart distributions are obvious choices. If we were considering a simple model such as the complex Watson, then a suitable prior distribution for the single concentration parameter κ is a gamma distribution.

Example 10.3 Consider modelling a random sample of pre-shapes z₁, …, z_n with a complex Watson distribution with mode μ and concentration parameter κ. The concentration parameter κ is assumed to be known. Let the prior distribution of μ be complex Bingham with known parameter matrix A. So, the posterior density of μ is given by:

where S = ∑ⁿ_{i = 1}z_iz*_i is the complex sum of squares and products matrix. Hence, we have a conjugate prior here, because the posterior is also a complex Bingham distribution, but with parameter matrix κS + A. The posterior mode μ_MAP is the dominant eigenvector of κS + A.

If we have a series of landmarks on outlines of objects, then a sutiable prior might encourage smoothness of the outline. It would be convenient to label the landmarks sequentially around the outline, so that landmarks i and j are neighbours if |(i − j)modk| = 1. We include the modulo k term so that landmarks 1 and k are considered neighbours. In particular, in this periodic case A = −FF^T, where

The choice of smoothing parameter λ is clearly of vital importance, as illustrated in the following example. We shall consider a complex Bingham prior with parameter matrix λA, and data distributed as complex Watson with parameter κ. Hence in this case the posterior distribution is also a complex Bingham with parameter matrix κS + λA, and the MAP estimate is the principal eigenvector of this matrix.

Example 10.4 Consider the Small group of T2 mouse vertebrae in the dataset of Section 1.4.1. There are k = 6 landmarks on n = 23 bones. We re-order the landmarks by swapping landmarks 1 and 2, so that the landmarks follow round anti-clockwise. In Figure 10.1 we see various smoothed estimates of the mean shape given by the posterior mode, which we write as μ_MAP. This example is used for illustration only to see what would happen if the smoothing parameter is taken to be too large. In practice this method is most useful when a large number of noisy landmarks are available around an outline. In Figure 10.1a we see the Procrustes mean (λ/κ = 0). In Figure 10.1b we see little difference in the MAP estimate (λ/κ = 0.1) but in Figure 10.1c (λ/κ = 1) there is a visible difference, with landmarks being brought in more to a discrete circle. We also consider what happens when a much too large smoothing parameter is used. In Figure 10.1d with a very large value of λ/κ = 100 the mean is very nearly a discrete circle. For such large λ/κ the integrated square derivative matrix dominates, which has the trigonometric eigenvectors for a circulant matrix (the eigenvectors form the discrete Fourier transform matrix). An automatic choice of λ/κ could be obtained by cross-validation in a similar manner to that described in Section 12.3.3.

Image described by surrounding text and caption. — **Figure 10.1** The smoothed Procrustes mean of the T2 Small data: (a) λ/κ = 0; (b) λ/κ = 0.1; (c) λ/κ = 1.0; and (d) λ/κ = 100.

There may be occasions when smoothed PCA is appropriate, particularly for outline data. After a smoothed mean μ_MAP has been obtained, the figures are transformed to μ_MAP by Procrustes analysis leading to the sample covariance matrix S. The first smoothed PC is obtained by finding the dominant eigenvector of κS + λA and practical details are very similar to the procedure of Rice and Silverman (1991). The first few scores from a smoothed PCA of the spine lines were used by Mardia et al. (1994) to summarize the shape information. Although it was a small pilot study it did appear that the twin spine shapes showed higher familial correlation than those of non-twin siblings. Mardia et al. (1996a) and Dryden et al. (2008b) also give some summary measures for measuring spinal shape.

Many authors have considered Bayesian shape analysis, especially for unlabelled shape as discussed in Chapter 14. Some other examples include Brignell (2007) and Mardia et al. (2013b) using Markov chain Monte Carlo simulation; Leu and Damien (2014) provide a detailed discussion of Bayesian shape analysis of the complex Bingham distribution; Micheas and Peng (2010) discuss a Bayesian version of Procrustes analysis with applications to hydrology; and Theobald and Mardia (2011) provide a framework for Bayesian analysis of the generalized non-isotropic Procrustes problem with scaling.

10.9 Size-and-shape distributions

10.9.1 Rotationally symmetric size-and-shape family

We could specify size-and-shape distributions directly in size-and-shape space SΣ^k_m. A suitable candidate could be that based on Riemannian distance, with density proportional to:

where d_S( · ) was given in Equation (5.5) and ϕ( · ) ≥ 0 is a suitable penalty function. In the case ϕ(x) = x² the MLE of size-and-shape will be the same as those using least squares partial Procrustes analysis, provided the integrating constant does not depend on μ, that is for the planar case. As for the shape case in Section 10.6 we can choose different functions for estimators with different properties. Further size-and-shape distributions based on normally distributed data are discussed in Section 11.5.

10.9.2 Central complex Gaussian distribution

For 2D data we could exploit the complex representation and, in an analogous manner to using the complex Bingham distribution in pre-shape space for shape analysis, we could specify a distribution for the Helmertized landmarks which is invariant under rotations. A suitable candidate is the central complex Gaussian distribution,

where Σ is Hermitian. The distribution of z_H is identical to that of e^iθz_H for any arbitrary rotation θ. The mode of the distribution is given by the dominant eigenvector of Σ, hence the MLE of size-and-shape under this model is the dominant eigenvector of

given a random sample of Helmertized observations z_H1, …, z_Hn.

10.10 Size-and-shape versus shape

If both size and shape are available, then we have two strategies in our analysis: either analyse shape and size separately (shape in shape space Σ^k_m and size on the positive real line) or analyse size and shape jointly in size-and-shape space SΣ^k_m. Which method is preferable depends on the aim of the study. If we are really interested in shape alone, then one would expect that more robust inference will be carried out using the first approach, even if the joint size-and-shape information is available, as less modelling assumptions need to be made. However, in many applications in biology one is interested in investigating joint size and shape information or in using size as a covariate, and then the size-and-shape space plays an important rôle.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 10 Shape and size-and-shape distributions

Create new playlist

Sign In

Sign Up