13
Non-parametric inference and regression

13.1 Consistency

The consistency of mean shape and size-and-shape estimators has been of long-standing interest. Consistency is a desirable property for an estimator, and informally a consistent estimator for a population quantity has the property that with more and more independent observations the sample estimator should become closer and closer to the true population quantity. Consider a random sample of n configurations given by X1, …, Xn from a distribution with population mean shape [μ]. Let a sample estimator of [μ] be obtained from X1, …, Xn and denoted by . We say that the estimator is consistent if for any inline > 0,

(13.1) numbered Display Equation

where d(, ) is a choice of shape distance. We write

numbered Display Equation

and say that converges in probability to [μ].

Similarly a mean size-and-shape estimator is consistent if

numbered Display Equation

It was shown by Lele (1993) that Procrustes mean estimators might not be consistent for the shape of a mean configuration under various Gaussian perturbation models (Lele and McCulloch 2002). However, Procrustes estimators are often consistent for a population Fréchet mean shape using the appropriate Procrustes distance. We shall investigate this issue in more detail below. For some models the shape of the means and population Fréchet mean shape coincide, and in this case the Procrustes estimators are consistent for the shape of the means, and in particular the full and partial Procrustes mean are consistent for the shape of the means for 2D isotropic distributions (Kent and Mardia 1997).

An early paper which gives a strong law of large numbers in separable quasi-metric spaces is by Ziezold (1977). In the paper, Ziezold (1977) defines the sample Fréchet mean and then shows that as n → ∞ the sample Fréchet mean tends to an element of the equivalence class of the population Fréchet mean. This classic paper is one of the earliest in size-and-shape analysis, and in particular introduces partial Procrustes analysis and gives the 2D case in some detail using complex arithmetic. Further papers expanding on the topic of planar size-and-shape and shape that also introduce further statistical inference procedures are also by Ziezold (1989, 1994).

Ziezold (1977)’s results are for intrinsic means. An alternative is to consider an embedding and use the corresponding limit theorems in the embedded space, and these measures of location are extrinsic means (see Section 6.1). Early results include Hendriks and Landsman (1996a,b) studying the asymptotic properties of the mean location on manifolds, with a focus on the the sample mean direction on spheres in Hendriks et al. (1996). Further inference, including large sample hypothesis tests and confidence regions on manifolds, is given by Hendriks and Landsman (1998, 2007).

Two important papers in non-parametric shape analysis are by Bhattacharya and Patrangenaru (2003, 2005) who lay out the framework for consistency for intrinsic and extrinsic mean estimators. Also, they provide central limit theorems for both intrinsic and extrinsic means. Results for extrinsic means are studied in particular detail. The two high-profile papers provide a thorough, careful treatment of non-parametric estimation and inference on Riemannian manifolds. Several examples using landmark shape data on Kendall’s shape space are given, particularly emphasizing the extrinsic means. As well as being more straightforward to deal with than the intrinsic mean, the extrinsic mean in the embedded space is unique, whereas conditions for uniqueness of the intrinsic mean are more restrictive and require concentrated data.

13.2 Uniqueness of intrinsic means

Intrinsic means are unique provided the distribution is sufficiently concentrated, (Karcher 1977; Kendall 1990b; Le 1995; Afsari 2011). A history of intrinsic means and their uniqueness has been given by Afsari (2011), and the notion of Riemannian centres of mass was introduced by Grove and Karcher (1973) and developed further in Grove et al. (1974, 1975); Karcher (1977); and Buser and Karcher (1981).

Definition 13.1 A Riemannian Lp centre of mass in a Riemannian manifold M is defined as a minimizer of

where dF is the probability measure on the manifold and d(x, μ) is the Riemannian distance between x, μ inline M.

The global and local minimum of f2 are often called the ‘Fréchet mean’ and ‘Karcher mean’, respectively (see Section 6.1). Likewise we call the global and local minimum of f1 the ‘Fréchet median’ and ‘Karcher median’, respectively.

In order to discuss uniqueness of the means we require the definition of a regular geodesic ball, from Kendall (1990a).

Definition 13.2 For a complete Riemannian manifold the geodesic ball Br(p) of radius r centred at p is regular if the supremum of sectional curvatures Δ is less than {π/(2r)}2 and the cut locus of p does not meet the ball Br(p).

Note that the cut locus at a point p comprises points which do not have unique minimum geodesics to p, for example the cut locus of a sphere at the north pole is the south pole. Kendall (1990b) showed that any two points in Br(p) are joined by one and only one minimal geodesic, and also proved the following result:

Result 13.1 (Kendall 1990b) For a probability distribution with support in Br(p), there is one and only one Karcher mean in Br(p).

Le (1991b) demonstrated global uniqueness by extending a result of Karcher (1977).

Result 13.2 (Le 1991b) For a probability distribution with support in the regular ball Br/2(p) the global minimizer of f2 is unique, that is the Fréchet mean is unique.

Clearly the results hold for either population or sample means, where for the sample mean from a random sample of size n the probability measure is replaced with probability 1/n at each data point.

An example where we can check uniqueness is for size-and-shape, using Le and Kendall (1993). Let X be a k − 1 × m matrix representing a Helmertized configuration of k > m points in m dimensions. Write

numbered Display Equation

with U inline SO(m), V inline SO(k − 1) and , then the maximum sectional curvature is the maximum value out of all of the following terms for any (i, j) of:

or

Note that this uniqueness result was used by Mitchelson (2013) who extended Procrustes estimation for the mean size-and-shape for temporal human movement data to the case where some of the landmarks are not visible at times. The resulting MOSHFIT (motion shape fitting) algorithm defines a visibility graph, and provided this is connected the mean size-and-shape can be computed and checked for uniqueness.

Further discussion of the cut locus of a Fréchet mean is given by Le and Barden (2014). A detailed relevant discussion is found in Afsari et al. (2013) who describe the best possible gradient descent algorithm for computing the mean, and provide an in-depth review.

Example 13.1 Here we consider the example of estimation of the mean size-and-shape of the male macaque data of Section 1.4.3. The space of interest is the size-and-shape space and the sectional curvatures are obtained using Equation (13.3) and Equation (13.4). The sectional curvatures are computed, the maximum taken, and then compared with the maximum radius of the data using the Riemannian distance:

library(shapes)
x<-macm.dat
H<-defh(dim(x)[1]-1)
n<-dim(x)[3]
m<-dim(x)[2]
maxsc<-0
for (ii in 1:n){
y<-H%*%x[,,ii]
lam<-svd(y)$d**2
for (i in 1:m){
for (j in 1:m){
for (r in c(i,j)){
sc<-3*(lam[i]+lam[j]-lam[r])/(lam[i]+lam[j])**2
if (sc>maxsc){
maxsc<-sc
}
for (s in 1:m){
if ((s != i)&&(s!=j)){
sc<-3*(lam[s]*(lam[i]+lam[j]-lam[r])/
  ((lam[i]+lam[j])*(lam[i]+lam[s])*(lam[j]+lam[s])) )
if (sc>maxsc){
maxsc<-sc
}}}}}}}
 
msh<-procGPA(x,scale=FALSE)$mshape
maxdist<-0
for (i in 1:(n)){
rho<-ssriemdist(x[,,i],msh)
if (rho>maxdist){
maxdist<-rho
}}
print(maxsc)
print(maxdist)
rball<-maxdist
print((pi/(4*rball))**2)
check<- (((pi/(4*rball))**2 - maxsc)>0)
print(check)

The output from running the above method is:

> print(maxsc)
[1] 0.0006405375
> print(maxdist)
[1] 12.71219
> rball<-maxdist
> print((pi/(4*rball))**2)
[1] 0.003817147
> check<- (((pi/(4*rball))**2 - maxsc)>0)
> print(check)
[1] TRUE

Here the data do lie inside a regular ball, and hence the size-and-shape mean is unique.

Example 13.2 A simpler case where uniqueness is easily checked is for planar shape. In this case the maximal sectional curvature is 4 (Kendall 1984). Hence we need 4 < [π/(2r*)]2, that is r* < π/4. Therefore the data should lie within a ball of radius π/8 ≈ 0.3927 to guarantee uniqueness. For the T2 mouse vertebrae all the observations lie within radius 0.1271 of the full Procrustes mean, and hence uniqueness is guaranteed for the intrinsic mean. However, for the digit 3 data the observations lie within 0.7061 of the full Procrustes mean, which does not guarantee uniqueness.

The calculations of the above example are as follows:

max(procGPA(qset2.dat)$rho)
[1] 0.1271043
max(procGPA(digit3.dat)$rho)
[1] 0.706119

13.3 Non-parametric inference

13.3.1 Central limit theorems and non-parametric tests

In Section 9.1.3 we introduced permutations tests and bootstrap tests based on tangent space approximations to the shape space, when the amount of variability in the data is small. However, the testing procedures can also be appropriate in situations where the data are less concentrated, provided the sample sizes are large and a central limit theorem is available.

Bhattacharya and Patrangenaru (2003, 2005) discussed large sample theory of intrinsic and extrinsic sample means on manifolds, and in particular provide central limit theorems for intrinsic and extrinsic means. The extrinsic mean central limit theorem is more straightforward and less restrictive, after projecting the data with a Euclidean embedding [also see Hendriks and Landsman (1998) for related work]. Amaral et al. (2007, 2010b) also considered central limit theorems for one and multi-sample shapes in two dimensions. Even if data are quite dispersed for large samples the means will have an approximate multivariate Gaussian distribution in a tangent space, and suitable pivotal statistics can be obtained where the asymptotic distribution does not depend on unknown parameters (e.g. chi-squared distributions). Further discussion of asymptotic distributions of sample means on Riemannian manifolds is given by Bhattacharya (2008) and Bhattacharya and Bhattacharya (2008, 2012).

Huckemann (2011a) explores the analysis of landmark shapes and size-and-shapes in three dimensions. In particular, he explores consistency and asymptotic normality of various types of mean shape and mean size-and-shape. Also, the degenerate rank cases are explored in detail, referring to motivating examples of diffusion tensors in medical image analysis. Discussion of the perturbation model and inconsistency is also carried out. Huckemann (2011a) extends the central limit theorem of Bhattacharya and Patrangenaru (2005) for extrinsic means (using the Veronese–Whitney embedding) to the full Procrustes sample mean for 3D data. From Dryden et al. (2014) the limiting distributions obtained using embeddings will generally depend on the chosen embedding. The intrinsic limiting distributions for intrinsic Fréchet means obtained in Kendall and Le (2011) eliminate this dependence and, at the same time, reveal explicitly the role played by the curvature of a Riemannian manifold M in the limiting behaviour of empirical intrinsic Fréchet means.

Amaral et al. (2007) considered pivotal bootstrap methods for k-sample problems for 2D shape analysis, and the method is available in the R routine resampletest (see Section 9.1.3 for some examples). In addition Amaral et al. (2010b) provide bootstrap confidence regions for the planar mean shape. In Section 9.1.3 we described both permutation tests and bootstrap tests on tangent spaces where the data are concentrated, but in cases where there is a central limit theorem available the tests are also often appropriate for large sample problems where the variability of the data may not necessarily be small. Some of the simulation studies of Amaral et al. (2007) demonstrate the utility of both permutation and bootstrap tests when the variability of the data is not small, but the sample sizes are large.

An early two sample non-parametric test was developed by Ziezold (1994) who considered a test statistic based on Mann–Whitney U statistics of partial Procrustes distances from each of the sample observations to the group means. Other non-parametric tests have been developed by Brombin and Salmaso (2009, 2013) who implement their multi-aspect permutation tests with small sample size to shape analysis applications; Brombin et al. (2011) who emphasize high-dimensional applications in shape analysis; and Brombin et al. (2015) who use non-parametric combination-based tests in dynamic shape analysis with applications to face gestures.

As well as non-parametric tests, the machinery that can be used for non-parametric Euclidean data analysis can be adapted to manifold valued data analysis. For example, kernel density estimation on Riemannian manifolds (Pelletier 2005); non-parametric Bayesian density estimation and consistency, with particular reference to planar shapes (Bhattacharya and Dunson 2010, 2012b); non-parametric regression estimation on Riemannian manifolds (Pelletier 2006); robust non-parametric regression (Henry and Rodriguez 2009); non-parametric Bayesian classification (Bhattacharya and Dunson 2012a); kernel based classification on Riemannian manifolds (Loubes and Pelletier 2008); k-means classification (Amaral et al. 2010a); and empirical likelihood for planar shapes (Amaral and Wood 2010).

13.3.2 M-estimators

The minimizers of the objective function (13.2) for sample data are types of M-estimators. Kent (1992) considered various estimators that minimize expressions of the form

where ρ( · ) is the Riemannian distance of Equation (4.12). These estimators can be considered M-estimators (maximum likelihood type estimators) for shape, and include the sample measure of location from (13.2). Although Kent (1992) utilized complex notation in the m = 2 case, such estimators are also valid for m ≥ 3 dimensions. Estimators of this type were considered in Section 6.3.

When the objective function in (13.5) has ϕ*(ρ) = ρ this is the same as p = 1 in the sample version of (13.2). Minimizing this objective function leads to the M-estimator being equal to the Fréchet/Karcher median which is discussed by Fletcher et al. (2009), with application to robust atlas estimation, and Koenker (2006).

13.4 Principal geodesics and shape curves

13.4.1 Tangent space methods and longitudinal data

Since the shape space or size-and-shape space is not a flat Euclidean space, we cannot simply apply the classical methods of linear regression and spline fitting directly to manifold valued data if the data are not concentrated. If the sample has little variability, the problem can be transferred to a tangent space (e.g. at the Procrustes mean of these shapes or size-and-shapes) and then Euclidean fitting procedures can be performed in this space. This is the approach of Morris et al. (1999) and Kent et al. (2001) who developed models where tangent space coordinates are modelled as polynomial functions of time for analysing growth in faces and growth in the rat data of Section 1.4.11. A pole is chosen in the middle of the dataset (e.g. the overall Procrustes mean) and the data are then projected into the tangent space at that point. Standard multivariate procedures can be carried out in the tangent plane, for example multivariate regression with covariates.

Let us consider the following multivariate regression growth curve model in a tangent space to shape or size-and-shape space, where there are n individuals available each observed at Nt time points. Denote the shape or size-and-shape tangent coordinates as vit (p × 1) vector for the ith individual at the tth time point (i = 1, …, n , t = 1, …, Nt). All individuals are assumed to be independent (although dependencies between individuals could be modelled if desired). A suitable regression model is:

(13.6) numbered Display Equation

for i = 1, …, n , t = 1, …, Nt, where Xf is the p × f design matrix for the f fixed effects α, and Xr is the p × r design matrix for the r random effects βi. In addition, the βi are taken to have a Nr(0, Δ) distribution and εi N(0, σ2) independently. The model can then be fitted by maximum likelihood or restricted maximum likelihood, using for example the R library lme4 or nlme (Bates 2005). Any approach that is standard for multivariate longitudinal data can be implemented for shape or size-and-shape data in this manner provided the variability is small, although one should take care with the ranks of the tangent plane coordinates. A suitable approach could be to reduce the dimensionality of the dataset by choosing the first p PC scores which summarize most of the variability in the tangent plane data. Applications of such methodology include: Bock and Bowman (2006); Bowman and Bock (2006); Barry and Bowman (2008); and Bowman (2008), who study linear mixed models for longitudinal shape data with applications to facial modelling, particularly with regard to modelling cleft-lip changes after surgery.

Non-parametric regression models for shape using B-splines have been developed for modelling dynamic shape or size-and-shape measures over time, including modelling ratios of distances (Faraway 2004b) or specific local coordinate systems such as relative positions and angles of articulated limbs (Faraway 2004a) as functions of time. Faraway (2004b) modelled dynamic smile movement in facial shape and Faraway (2004a) described various human movement studies. Alshabani et al. (2007b) applied Bayesian analysis of human movement shape curves to the application introduced in Section 1.4.13, where inference about the change-points for the start and end of the movements is particularly important. Indeed in all these applications where the shape or size-and-shape occurs at different rates for different individuals it is necessary to warp the curves to a common reference time before different individuals can be compared. The need for registration in time is a common issue in the analysis of functional data (e.g. see Kneip and Gasser 1992; Silverman 1995; Ramsay and Li 1998; Kneip et al. 2000; Claeskens et al. 2010; Srivastava et al. 2011b; Zhou et al. 2014; Cheng et al. 2016).

Time series models can be used where there are long temporal sequences of shapes or size-and-shapes. For example Dryden et al. (2002, 2010) investigate size-and-shape analysis of DNA molecular dynamics simulations, and in particular introduce the stationary Gaussian time-orthogonal PC model, where each PC score is independent, with different temporal autoregressive covariance structures. The model is applied to molecular dynamics simulations in pharmacy, where it is of interest to estimate the entropy of the molecule. A very small subset of the DNA data was described in Section 1.4.7, whereas in most practical studies there are often more than 10 000 up to millions of observations.

Das and Vaswani (2010) consider non-stationary models for filtering and smoothing of shape sequences, such as tracking human activity in video sequences. Vaswani et al. (2005) study shape activity with continuous state hidden Markov models for temporal sequences of shapes, with an application to detection of abnormal activity.

13.4.2 Growth curve models for triangle shapes

Goodall and Lange (1989) consider growth curve models for triangle shapes. This situation is particularly simple because the Kendall (1983) shape space is a sphere . In particular Goodall and Lange (1989) give an algorithm for fitting a great circle growth curve shape model through the data. There are five stages in the algorithm:

  1. Fit a great circle through the data using spherical regression (e.g. see Fisher et al. 1987, Section 8.3).
  2. Rotate the fitted great circle to the equator.
  3. Fit a growth curve model (possibly with covariates) where the angles at the equator (longitudes) depend on fixed and random effects. In the simple case of a fixed and random intercept/slope model we have:

    numbered Display Equation

    where i = 1, …, n and normal models are assumed (β0i, β1i)TN2(0, Δ) and εiN(0, σ2) independently, and all individuals are assumed to be independent.

    The parameters of the model are fit by maximum likelihood estimation or restricted maximum likelihood estimation.

  4. The estimated mean growth curve is rotated back to the data giving the ‘best’ fitting great circle through the data.
  5. Steps 1–4 are repeated until convergence.

More fixed and random effects could be included to help explain the change in shape over time.

13.4.3 Geodesic model

Le and Kume (2000a) introduced the idea of testing for geodesics to examine whether shape change over time follows a geodesic in shape space. The method does not rely on concentrated data. This type of shape change has also been called the geodesic hypothesis (Hotz et al. 2010). Le and Kume (2000a) applied the methodology to the rat skulls data of Section 1.4.11, and the method involves working with Riemannian distances in the shape space and then carrying out MDS. The resulting Euclidean embedding is 1D if and only if the data in the original manifold follow a geodesic path, and hence we can investigate whether classical MDS plots are 1D. When investigating growth of individuals we would like to examine if the objects grow along a geodesic in shape space. In further examples the geodesic model has been examined by Hotz et al. (2010) and Huckemann (2011b) in both tree ring and leaf shape applications.

Example 13.3 For the rat skulls data of Section 1.4.11 we calculate all pairwise Riemannian distances, and then carry out classical MDS using the command cmdscale in R:

library(shapes)
data(rats)
distmat<-matrix(0,144,144)
for (i in 1:144){
for (j in 1:144){
distmat[i,j]<-riemdist(rats$x[,,i],rats$x[,,j])
}
}
plot(cmdscale(distmat))

The resulting plot is given in Figure 13.1 and we see that there is clear non-linear structure, and hence the growth is not along a geodesic. The plot is similar to that in several earlier analyses of this dataset, e.g. Bookstein (2014, p. 415).

Image described by surrounding text and caption.

Figure 13.1 Multidimensional scaling plots of the rat data, using the pairwise Riemannian distances between all 144 rat skull shapes, with shapes for each individual rat joined by lines.

13.4.4 Principal geodesic analysis

Fletcher et al. (2004) developed principal geodesic analysis (PGA) as a method for obtaining the analogue of PCs on manifolds. The idea is to first find the intrinsic mean, then find the PC axes in the tangent space to the intrinsic mean, and then finally to project back from the PC tangent vectors to the geodesic subspaces using the exponential map. Fletcher et al. (2004) applied the methodology to M-reps, which are medial axis representations used in medical imaging. A further example of PGA is given by Fotouhi and Golalizadeh (2012), in exploring the variability of DNA molecules. Fletcher (2013) provides a summary of geodesic regression and the theory of least squares on Riemannian manifolds.

Huckemann et al. (2010) have developed a comprehensive approach to modelling geodesics for shapes, and developing analogues of PCA for planar and 3D shapes using intrinsic methodology. Principal geodesics in triangular shape space were investigated initially by Huckemann and Ziezold (2006), who provided methodology for fitting a best geodesic through a set of triangle shapes (i.e. points on a sphere). In this work an alternative sample mean for the data is proposed which does not coincide with the usual Procrustes means, but rather the mean lies on the principal geodesic. This idea was then extended by Huckemann and Hotz (2009) to PC geodesics for planar shape spaces, and more generally in Huckemann et al. (2010) to the idea of geodesic PCA. A set of orthogonal geodesics are obtained in the shape space, and then PCA is carried out using projections onto the geodesic axes. The position of crossing of the geodesics is an alternative definition of mean shape of the data.

Related methodology is given in Kenobi et al. (2010) who also develop explicit methodology for fitting principal geodesics, and then proceed to define curvilinear paths through data (analogous to polynomials) by using principal geodesic axes. Further developments include applications of fitting geodesics in planar shape space and testing for common mean principal geodesics in different groups, applied to leaf shape growth using large sample asymptotics (Huckemann 2011b) and tree ring growth for planar outline shapes (Hotz et al. 2010). The applications considered by Kenobi et al. (2010) involve human movement data and lumbar spinal shape, where departures from the geodesic model are detected using likelihood ratio tests. Faraway and Trotman (2011) investigate shape change along geodesics applied to examples from cleft lip surgery. Further techniques of interest are found in Sozou et al. (1995) who use polynomial regression to account for non-linear variation in tangent space point distribution models; Hinkle et al. (2014) who introduce intrinsic polynomials for regression on Riemannian manifolds; and Piras et al. (2014) who develop a Procrustes motion analysis method based on a linear shift (parallel transport) in order to describe the 3D shape changes over time in the left ventricular heart cycle.

13.4.5 Principal nested spheres and shape spaces

Other related methodology which involves a very different backwards fitting approach is that of Jung et al. (2012) who consider fitting a sequence of principal nested spheres. The procedure involves fitting a sequence of successively lower dimensional spheres to the data. The penultimate nested sphere in the sequence is a best fitting ‘small circle’ curve in general, and a special case leads to a best fitting geodesic (‘great circle’ on the sphere). The final zero-dimensional ‘sphere’ (point) in the sequence is another alternative definition of a mean. Principal nested spheres builds on the methodology of Jung et al. (2011) on principal arc analysis on direct product manifolds, where applications to M-reps (Pizer et al. 2003) which are medial representations of medical images. In particular, Jung et al. (2011) use principal arcs to model the prostate in medical images. Also, Pizer et al. (2013) consider nested sphere statistics of skeletal models. Jung et al. (2012) also develop principal nested spheres for planar shape data, making use of the fact that the pre-shape sphere is a sphere, and at all stages the data are in optimal rotational alignment (Procrustes registered).

13.4.6 Unrolling and unwrapping

Consider the situation where we are given data consisting of n shapes or size-and-shapes observed at successive times, and we are interested in fitting a smooth curve to these shapes.

In situations where the shape or size-and-shape change is large, the tangent space approximation will not be appropriate, and so other methods are required. In the case of triangle shapes in the plane there is a method available for fitting smooth curves through shape data. Since Σ32 is isometric with the 2D sphere of radius 1/2, the problem for planar triangles can be solved by the novel method proposed by Jupp and Kent (1987), who introduced an algorithm for fitting spherical smoothing splines to spherical data based on the techniques of unrolling and unwrapping onto an appropriate tangent space. Le (2003) extended the results to unrolling in shape spaces and Kume et al. (2007) extended the fitting method of Jupp and Kent (1987) to data in Σk2 for the unrolling and unwrapping along a piecewise geodesic in Σk2, using complex linear methods, and also a method for fitting splines to data in tangent spaces to Σk2. The resulting paths are called shape-space smoothing splines. The shape-space smoothing splines are obtained by using the fact that Σk2 is a Riemannian quotient space of , the pre-shape space of configurations obtained after removing the information on location and scale. This allows the identification of geodesics in Σk2 with particular geodesics in and identification of tangent spaces to Σk2 at various shapes with particular subspaces of the appropriate tangent spaces to .

Kume et al. (2007) describe in detail how to unroll a geodesic G1 between z0 and z1 onto a tangent space at a point z0 (pole of the tangent space) on the geodesic. The unrolling is a straight line in the tangent space passing through 0 to the inverse exponential map projection of z1, and the length of the geodesic from z0 to z1 is the same as the length of the line in the tangent space. To unwrap a point w with respect to the geodesic from z0 to z1 first one projects w on to the tangent space at z1, denoted , using the inverse exponential map (so that angles with respect to the geodesic and lengths are preserved). Then one needs to see how the projected point in maps to , when rolls along the geodesic (using parallel transport) and touches the shape space at z1. After unrolling and unwrapping we have the geodesic G1 and the point w mapped to a line segment and a point, respectively, in the tangent space .

The procedure is then extended to piecewise geodesics, which are unrolled to give piecewise linear segments in a single base tangent space , and data points in the shape space can be unwrapped with respect to the piecewise geodesic path. Statistical fitting procedures such as smoothing splines, can be fitted in the base tangent space . The reverse geometrical transformations of rolling and wrapping are employed to map the base tangent points back to the shape space with respect to the piecewise geodesic path. An illustration of unrolling and unwrapping with respect to a piecewise geodesic path is given in Figure 13.2.

Image described by surrounding text and caption.

Figure 13.2 A diagrammatic view of unrolling and unwrapping with respect to a piecewise geodesic curve. Source: Kume et al. 2007. Reproduced with permission of Oxford University Press.

Kume et al. (2007) describe an iterative fitting algorithm whereby the piecewise geodesic path is updated after each spline fitting. The algorithm usually converges within a small number of iterations, and the smoothing parameters in the spline can be chosen using cross-validation. Since any continuous path can be approximated by a continuous piecewise geodesic path, the algorithm gives discretized versions of shape-space smoothing splines.

Note that unrolling and unwrapping can be generalized to the shape spaces of configurations in . Unrolling and unwrapping procedures for m ≥ 3 are available and are given in Le (2003) using matrix representations. The procedures involve solutions of homogeneous first-order linear differential equations which must be solved numerically in general. For 2D data (m = 2) the use of complex arithmetic leads to explicit expressions for the unrolling and unwrapping of piecewise geodesics, which leads to a much more straightforward and transparent implementation for this important case. Also see Koenderink (1990) for discussion of discretized unrolling of geodesics.

Example 13.4 We consider the human movement data from Section 1.4.13 where k = 4 landmarks are recorded in m = 2 dimensions for five movements, each with 10 equally spaced time points. It is of interest to model the shape change over time, and an analysis was carried out by Kume et al. (2007) as in the following.

For each time observation we find the corresponding Procrustes mean shape of the five shapes observed at that time. We then take the corresponding fitted shape space smoothing spline to these 10 mean points. In order to obtain a sensible representation of our data we use the ‘fitted mean path’ to unwrap the observed data points at the tangent space of its starting point. The first two PC scores of the resulting data in this tangent space are plotted in Figure 13.3. Figure 13.3a shows fitted smoothing splines (λ = 0.00013), and Figure 13.3b shows fitted approximate geodesics (λ = 60658.8). The cubic splines were fitted with the R routine smooth.spline() with parameters , respectively. The percentage of variability explained by the first two PCs is 95.5, 3.8% and 97.9, 1.9% for each plot, and the first two PCs provide a very good summary of the data. Given that the variability explained by the first PC is so high, we consider a hypothesis test to examine if a geodesic provides a good summary of the data. Kume et al. (2007) show that there is very strong evidence against a mean geodesic versus the alternative of a mean shape space spline, using a hypothesis test based on complex Watson distributed data. Indeed from Figure 13.3 it is clear that the mean shape-spline is a much better fit to the data than the mean geodesic.

Image described by surrounding text and caption.

Figure 13.3 The first two PC scores of the unrolling of the human movement data paths with respect to the fitted mean path. In (a) fitted smoothing splines are shown in solid black (λ = 0.00013) with the projected data points joined by dashed lines. In (b) fitted approximate geodesics (λ = 60658.8) are shown in solid black, with the projected data points joined by dashed lines. In both plots the encircled points are knots of the mean path. Source: Kume et al. 2007. Reproduced with permission of Oxford University Press.

13.4.7 Manifold splines

Su et al. (2012) consider a general method for fitting smooth curves to time-indexed points pi on Riemannian manifolds using generalizations of splines. Let M be a Riemannian manifold and γ: [0, 1] → M be an appropriately differentiable path on M. The goal is to find a path that minimizes the energy function:

(13.7) numbered Display Equation

The first term is referred to as the data term and the second term is referred to as the smoothing term. The asymptotic limits of the solution were investigated by Samir et al. (2012). As λ1 tends to zero, for a fixed λ2 > 0, one obtains a geodesic curve as the optimal curve. Similarly, as λ2 tends to zero, for a fixed λ1 > 0, the optimal curve is analogous to a piecewise cubic polynomial that interpolates between the given points. Su et al. (2012) use a steepest-descent algorithm for minimizing E(γ), where the steepest-descent direction is defined with respect to the second-order Palais metric. The algorithm is applied to several applications by Su et al. (2012), including planar shapes, symmetric positive definite matrices and rotation matrices. In Figure 13.4 we see the interpolated shapes from a video sequence of images of a dancer, from Su et al. (2012). From the initial sequence four images are taken (first row), and noise is added to the two middle figures (second row), and then a spline is fitted in a tangent space at the mean (third row), piecewise geodesics are fitted which interpolate the noisy data (fourth row), the unrolling method of Kume et al. (2007) (Section 13.4.6) is applied (fifth row) and finally the manifold spline method of Su et al. (2012) is applied (sixth row). Note that the last two methods are quite similar in this example.

Image described by surrounding text and caption.

Figure 13.4 The original data, the noisy data, and interpolated and smoothed shape sequences using different techniques. Source: Su et al. 2012. Reproduced with permission of Elsevier.

Another promising technique is that of a principal flow (Panaretos et al. 2014), which is a curve on the manifold passing through the mean of the data, where a particle flowing along the principal flow attempts to move along a path of maximal variation of the data, up to smoothness constraints. The technique involves solving an ODE, and uses the Euler–Lagrange method.

13.5 Statistical shape change

Rather than describing the shape change between two individuals we may have random samples available from different populations. Comparison and explanation of population shape differences are required.

The situations under study can be categorized into two situations: independent samples; and dependent samples (matched pairs).

  1. Independent samples: the objects under study are all mutually independent.

    Example: Consider the mouse vertebral data of Section 1.4.1. It is of interest to describe the differences in size-and-shape between the T2 vertebrae in the Control, Small and Large groups. All observations can be assumed to be independent here.

    Example: Bookstein and Sampson (1990) consider two independent samples of children; in one group the mothers drank excess alcohol during pregnancy; and in the other group the mothers did not. It is of interest to explore the shape differences in the faces of the children between the independent groups.

    Hypothesis tests on whether there are population shape differences in the above examples can proceed using the methods described in Chapters 9 and 10. However, we may probe further and examine, for example, whether a population shape difference is affine or has some other simple structure.

  2. Dependent samples: the objects under study are related, for example the same individual before and after an operation, or describing the size-and-shape change as a particular organism grows over time.

    Example: Mardia and Walder (1994a) consider the rat calvarial growth data of Moss et al. (1987) in Section 1.4.11, in particular the shape difference between ages 90 and 150 days. X-ray images are taken of the head of each rat at each stage and so there is a natural pairing here, as each individual is followed through time.

Further developments that are useful for longitudinal data or the study of shape versus time include Sasaki metrics (Muralidharan and Fletcher 2012) for analysis of longitudinal data on manifolds. Sasaki metrics are defined on a cone of where M is a manifold, and so they are also appropriate for size-and-shape analysis, where M is the shape space. Also, Thompson and Rosenfeld (1994) describe deterministic and stochastic growth models for modelling shapes; Niethammer et al. (2011) discuss geodesic regression of image time series; and Niethammer and Vialard (2013) develop parallel transport methods for shapes.

13.5.1 Geometric components of shape change

Consider the m = 2 case with two figures represented in terms of the Bookstein coordinates of Equation (2.4), with (θj, ϕj)Tj = 3, ..., k, for the first object and (θ*j, ϕj*)Tj = 3, ..., k, for the second object. Consider predicting the Bookstein coordinates (θ*, ϕ*)T at a point in the second figure given the corresponding point (θ, ϕ)T in the first figure. Geometric components of shape change (Bookstein and Sampson 1990) are expressions of the form:

(13.8) numbered Display Equation

subject to the constraints

numbered Display Equation

where are polynomials in two real variables. It follows that the affine transformation between two figures in Bookstein coordinates is:

(13.9) numbered Display Equation

In Figure 13.5 we see an artificial example of a uniform transformation between two configurations in terms of Bookstein coordinates. The displacement vectors are all parallel, each vector magnitude is proportional to the vertical distance from the line ϕ = 0 and each vector direction depends on the sign of ϕj. From Section 12.2.4 we see that an affine transformation always describes the shape difference for the triangle case (k = 3) but for k > 3 points the fit will in general be only approximate. Unlike the PTPS described in Section 12.3.1, this is not an interpolant in general.

Image described by surrounding text and caption.

Figure 13.5 An affine deformation from (a) to (b). In (c) we see the change vectors in Bookstein coordinates, which are all parallel with length proportional to the distance from ϕ = 0, and direction depending on the sign of the ϕ coordinate.

Bookstein and Sampson (1990) include tests for the goodness of fit of linear and quadratic geometric components. Their motivation was a study into the differences in face shape in children whose mothers drank heavily during pregnancy and those whose mothers did not (also see the FASDs example in Section 1.4.6). Their approach was to use general multivariate normal models for the shape variables, as described in Section 9.4. If variations about the mean landmarks are small, then this approach is reasonable. The situation considered was for two independent samples. Bookstein and Sampson (1990) also consider the matched pairs situation with the rat growth data of Moss et al. (1987) from Section 1.4.11. The testing procedures involve the development of suitable Hotelling’s T2 tests. Since Bookstein’s shape variables and Procrustes tangent coordinates are approximately linearly related for small variations (Kent 1994), it follows that the Hotelling’s T2 tests will be approximately the same as those conducted in tangent space, described in Section 9.1.

Mardia and Dryden (1989b) also considered testing for affine shape change in two independent samples using offset normal distributions with isotropic covariance structure. The fitting of second-order polynomial functions is the next extension and explicit details were given by Bookstein and Sampson (1990) for the paired samples case. For quadratic fitting we need k > 6 landmarks for a meaningful fit.

13.5.2 Paired shape distributions

Mardia and Walder (1994a) have investigated offset normal shape distributions for paired data (e.g. the same individual observed at two different time points). Consider two complex figures Z1 and Z2 which are marginally isotropic normally distributed with different means, different variances and a correlation between the two figures. Let Zij be the jth complex coordinate of the ith figure, then the model is written as:

(13.10) numbered Display Equation

independently for j = 1, …, k, where ρP is real. Let τ2i = 4σi2/||μi2 − μi1||, i = 1, 2, and let ξ be the angle between the mean baselines of the two figures, that is

numbered Display Equation

Let Wi = (Wi3, ..., Wik)T be complex Kendall coordinates for the observed figures Zi, i = 1, 2, and we write W = (WT1, W2T)T. Mardia and Walder (1994a) obtained the marginal shape distribution of W under this model, which has a very complicated density function. Mardia and Walder (1994b) suggest an alternative density as the rotationally symmetric bivariate complex Watson density on the pre-shape sphere, with density proportional to:

numbered Display Equation

for pre-shapes z1 and z2, which in shape space is:

numbered Display Equation

where ρi is the Procrustes distance from the shape [zi] to the ith mean shape [γi], i = 1, 2. Note that κ12 acts as a dependence parameter between the two shapes – if κ12 = 0, then the shapes are independent. Prentice and Mardia (1995) have given an alternative spectral approach for paired shape data.

13.6 Robustness

Types of robustness/resistance that are of interest in landmark shape analysis include:

  1. resistance to landmark outliers on specimens (i.e. some specimens have particular landmarks that are very unusually located);
  2. resistance to object outliers (i.e. objects that are very different from the rest of the random sample);
  3. robustness to model mis-specification.

Strictly speaking the words ‘robust’ and ‘resistant’ should be used in this manner (i.e. resistant inference can deal with outliers and robust procedures can deal with incorrect models). However, it is common in the literature to use the word ‘robust’ as an umbrella term to cover all cases.

Consider the situation where we have k points in m dimensions. A general regression equation for matching T to Y with additive errors is Y = g(T) + E, and denote (E)j as the jth row of E, j = 1, …, k, which are the coordinates of the jth landmark, where g is a general inverse-link function.

For shape matching the model is:

where . For size-and-shape

and for affine matching

(13.13) numbered Display Equation

where A is a general m × m matrix. So, in the case of (13.11) estimation could be carried out by ordinary full Procrustes matching and for (13.12) by ordinary partial Procrustes matching.

Procrustes estimation involves minimizing the sum of squared norms of the errors at each point, that is using least squares:

numbered Display Equation

Several resistant methods have been proposed including the repeated median technique of Siegel and Benson (1982), also used by Rohlf and Slice (1990).

Dryden and Walker (1999) adapt the S-estimator of Rousseeuw and Yohai (1984) for shape analysis. When the response is multivariate an isotropic S-estimator could be defined as the solution to the following minimization problem: minimize s2{(E)1, …, (E)k} over the unknown parameters subject to

The positive continuously differentiable function , satisfies ξ(0) = 0, is strictly increasing on [0, c], is constant for x > c for a fixed c > 0 and , where (E)j/s is considered standard normal for calculating K.

An alternative estimator is the least median of squares (LMS) estimator (Rousseeuw 1984) with objective function

numbered Display Equation

The LMS residual discrepancy measure is:

numbered Display Equation

If we relax some of the conditions for the isotropic S-estimator to allow the choice of indicator function

numbered Display Equation

and K = [(k + 1)/2]/k, then the LMS objective function leads to a solution of Equation (13.14).

The LMS procedure has a very high breakdown ε* of almost 50% (the breakdown is the minimum percentage of points that can be moved arbitrarily to achieve an infinite discrepancy).

Minimization of the objective function can be difficult because the function is not smooth and there are usually local minima. An approximate procedure based on exact matching of all possible triplets of points and then choosing the triplet that minimizes the objective function leads to an approximate solution. This procedure can be speeded up and made a little more efficient by not considering very thin triplets or very small triplets, see Dryden and Walker (1999).

Many other robust regression procedures could be used for matching, such as GS-estimators, M-estimators and least absolute deviations. Verboon and Heiser (1992) use the Huber (1964) function and the bi-weight function for matching two object configurations, where reflection is also allowed. In selecting an estimator one needs to make a compromise between breakdown and efficiency, and any choice will be very much application dependent. In a simulation study Dryden and Walker (1999) found that the choice of an S-estimator with 25% breakdown leads to high efficiency when the errors are normal.

A highly resistant procedure such as LMS is very useful for identifying outliers. An approach might be to first use a resistant procedure to superimpose the objects, examine the residuals and re-investigate those landmarks with very large residuals. One possible course of action could be to ignore a suspect point and then proceed with a conventional more efficient analysis (e.g. Procrustes least squares) on the rest of the data.

Siegel and Benson (1982) considered resistant registration of objects using the technique of the repeated median. Rather than minimizing the median as in LMS the repeated median involves taking the median of the median. The algorithm for registration involves sequentially updating the scale, rotation and location parameters by univariate repeated median estimators. A major disadvantage of this technique is that it is not equivariant under affine or similarity transformations. Indeed, LMS was introduced by Rousseeuw (1984) as an alternative to the repeated median, giving equivariance as a major motivation for studying LMS.

Generalized resistant matching for random samples of objects can proceed in an analogous manner to GPA (Dryden and Walker 1998). Rohlf and Slice (1990) considered generalized matching using the repeated median technique and gave an algorithm for resistant shape matching and resistant affine matching. A practical demonstration on the shapes of mosquito wing data was compared with the usual Procrustes registration. Although the two approaches were fairly similar in that dataset, the resistant fit registration resulted in less variability at the less variable landmarks and more variability at the more variable landmarks, as expected. Dryden and Walker (1998) also examine the shape variability in a generalized matching procedure with s( · ) given by an S-estimator and ϕ(E) = ||E||2. Although plots of registered configurations look quite different in examples, the plots of the first few PCs actually look very similar when placed in the same registration.

Example 13.5 Consider matching the electrophoretic gel data of Section 1.4.14 with an affine transformation, but consider the situation where the invariant spots on gel A are located correctly but two of the invariant spots in gel B have been mislabelled. In Figure 13.6 we see the fitted points in gel A after a least squares affine transformation and after the LMS affine transformation. As expected, the least squares fit is dramatically affected by the wrongly identified points, whereas the resistant LMS fit is not affected by the two outliers.

Image described by caption.

Figure 13.6 The fitted gel A registered by affine fitting using the invariant points in the gels, with two points poorly located in gel B. (a) Least squares affine transformation of gel A; (b) LMS affine transformation of gel A.

13.7 Incomplete data

A practical problem that can often be encountered is that there are missing data at a subset of landmarks for some objects in a dataset. Missing data can arise for a variety of reasons, for example part of a fossil bone may be missing or an object may be occluded in an image. Distance-based methods and Euclidean Shape Tensor Analysis of Section 15.1 can deal with missing landmark data as one considers subsets of pairs or larger subsets of points, and it is not imperative that every pair/subset is present in every object in the dataset. Obviously if a particular landmark is often missing, inferences will be less powerful concerning that landmark.

If the missing data are ‘missing at random’ (Rubin 1976), then one can proceed as one would do for regression analysis. We can either delete objects where landmarks are missing, delete any missing landmarks in all the objects or we can fill-in the incomplete data. The first approach is practical when just a few objects have missing landmarks. However, if a large proportion of the objects have a few (different) landmarks missing at random, then a fill-in approach may be preferable.

Procrustes analysis can be adapted using a type of EM algorithm (Bookstein and Mardia 2001). One needs to iterate between filling-in landmarks (i.e. estimating landmarks) and standard Procrustes analysis, assuming the estimated locations of the landmarks. The expectation step involves estimating the unknown landmark coordinates by taking the conditional mean of the unknown coordinates given the other landmarks in their current position through the kriging predictor, or using a thin-plate spline transformation from the mean shape for prediction. The minimization step involves the usual Procrustes registrations using the estimated missing coordinates.

Albers and Gower (2010) described a method for dealing with missing values in Procrustes analysis. Also, as described in Section 13.2, Mitchelson (2013) provided a method for handling occluded landmarks in human movement studies. Recent work dealing with missing data in studies of biological evolution of bone surfaces includes Gunz et al. (2009).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset