15
Euclidean methods

15.1 Distance-based methods

An alternative to working with geometrical configurations directly is to work with inter-landmark distances. Consider the squared Euclidean distance matrix D from the configuration X (k × m matrix) given by:

where (X)r are the coordinates of the rth point (r = 1, …, k). We consider methods for shape and size-and-shape analysis that involve working with the full collection of such distance matrices, and in some cases the estimates can be similar to Procrustes techniques. Traditional morphometrics studying lengths, ratios of lengths or angles usually considers just a subset of the inter-landmark distances, and was summarized in Section 2.3.

15.2 Multidimensional scaling

15.2.1 Classical MDS

Multidimensional scaling (MDS) is concerned with constructing a configuration of k points in Euclidean space from information about the distances between the k points (see Mardia et al. 1979, pp. 394–398). Consider X to be a k × m configuration with k × k squared Euclidean distance matrix D, as in Equation (15.1). It can be shown that D is a squared Euclidean distance matrix if and only if

numbered Display Equation

is positive semi-definite, where C is the k × k centring matrix of Equation (2.3). We can interpret B as the centred inner product matrix of the k × m configuration X.

If B is positive semi-definite with rank pk − 1, then a configuration can be constructed which has the same squared Euclidean distance matrix as D, for example

numbered Display Equation

where fj are the eigenvectors of B scaled so that fTjfj = aj, where the aj are the eigenvalues of B in descending order. Note that any translated, rotated and reflected version of Y will also have the same squared Euclidean distance matrix.

In practice it is often useful to find a configuration Y* in m < p dimensions with squared Euclidean distance matrix approximately equal to D. A sensible approximation is to take

numbered Display Equation

This configuration is called the classical solution to the MDS problem. It will be reasonable provided the first m eigenvalues of B are large compared with the rest. Any rotation and reflection of MDSm(D) will also be an equivalent solution.

15.2.2 MDS for size-and-shape

Consider a sample of n configurations of k landmarks. Let be the average squared Euclidean distance matrix over the n configurations, where d2i(h1, h2) is the squared Euclidean distance between the landmarks h1, h2 inline {1, …, k} for the ith observation, i = 1, ..., n. Then following the classical MDS method an estimate of the mean reflection size-and-shape is:

(15.2) numbered Display Equation

In such a distance-based method we cannot distinguish between an estimate and its reflection, but for concentrated data, the appropriate choice is not difficult. This method is described in Kent (1994) which applies for both size-and-shape and shape data. For size-and-shape data no pre-scaling is necessary. For shape data the configurations are pre-scaled to unit size. In the planar case (m = 2) we can plot a 2D solution by using f1 + if2 or f1if2.

15.3 Multidimensional scaling shape means

Several types of MDS means have been suggested in the literature and the following summary is taken from Dryden et al. (2014). The reflection shape of an object may be considered to be the geometric information remaining when all location, scale, rotation and reflection information has been removed. There are various ways of defining the mean reflection shape of an object, including the approach proposed independently by Bandulasiri and Patrangenaru (2005) and Dryden et al. (2008a), and the alternative definition proposed by Bhattacharya (2008). There is in fact a family of related definitions of mean reflection shape and details were given by Dryden et al. (2014) and in an unpublished ISI 2009 conference paper by Preston and Wood (2009).

Using the notation in Dryden et al. (2008a), the MDS mean reflection shape of an object in m dimensions described by k landmarks is defined as follows. Let X denote an m × (k − 1) Helmertized transposed configuration matrix, scaled so that the trace trace(XX) = 1. We assume throughout that 1 ≤ m < k − 1. Then XX lies in the space , where we define to be the space of (k − 1) × (k − 1) symmetric matrices Y with and .

Suppose that the population mean Ξ = E(XX) has spectral decomposition ∑k − 1i = 1λiuiui, where λ1 ≥ ⋅⋅⋅ ≥ λk − 1 ≥ 0 are the eigenvalues of Ξ, with corresponding unit eigenvectors u1, …, uk − 1. The mean reflection shape, as defined by Bandulasiri and Patrangenaru (2005) and Dryden et al. (2008a) is given by:

Note that, when the distribution of XX is non-degenerate, Ξ will not lie in but, by construction, ϕ(Ξ) always does.

An alternative definition of the mean reflection shape, proposed by Bhattacharya (2008, Section 6), is given by:

where the λi and ui are as before and . Note that A(Ξ) also lies in .

Observe that the adjustments for the MDS estimators to lie in are different: for Equation (15.3) there is a multiplicative adjustment and for Equation (15.4) an additive adjustment. In fact, Dryden et al. (2014) showed that ϕ(.) and A(.) are both members of a one-parameter family of projections indexed by α ≥ 1/2 of the symmetric non-negative definite matrices onto , using distance

numbered Display Equation

where A and B are non-negative definite square matrices. The projections used in Equation (15.3) and Equation (15.4) are for α = 1/2 and α = 1, respectively, although in both cases the averaging is done using α = 1. We use the abbreviation MDS for the mean in Equation (15.3) and MDS(α = 1) for the estimate in Equation (15.4).

Example 15.1 An example comparing MDS and MDS(α = 1) means to a family of extrinsic means and the intrinsic mean was given in Example 6.2 applied to the male gorilla data of Section 1.4.8. To obtain the MDS means in R we can use the following code:

data(gorm.dat)
> A<-MDSshape(gorm.dat,alpha=1,projalpha=1/2)
> A
       [,1] [,2]
[1,] 0.5002587 0.0255867592
[2,] -0.4369590 -0.0676547676
[3,] -0.3239341 0.1445435828
[4,] -0.1997595 0.1625496138
[5,] 0.1127959 0.1400977827
[6,] 0.4304058 -0.0005215325
[7,] 0.1291547 -0.1888126195
[8,] -0.2119625 -0.2157888189
> B<-MDSshape(gorm.dat,alpha=1,projalpha=1)
> B
       [,1] [,2]
[1,] 0.5000616 0.0256419012
[2,] -0.4367869 -0.0678005703
[3,] -0.3238065 0.1448550885
[4,] -0.1996808 0.1628999242
[5,] 0.1127515 0.1403997073
[6,] 0.4302363 -0.0005226565
[7,] 0.1291038 -0.1892195293
[8,] -0.2118790 -0.2162538651
> riemdist(A,B)
[1] 0.0009210365

Here we compute the MDS mean shape followed by the MDS(α = 1) mean shape, for the male gorilla data of Section 1.4.8. Here the two mean shapes are very similar, with Riemannian shape distance just 0.0009 apart.

It is not clear which definition of mean reflection shape is to be preferred. Bhattacharya (2008) points out that (15.4) is a Fréchet mean with respect to a particular metric d1 and indicates a preferance for that reason, but it is not clear why a projection involving an additive adjustment should be preferred to projection involving a multiplicative adjustment or something else.

Some inferential procedures for MDS shape were given by Dryden et al. (2008a), including central limit theorems, and Preston and Wood (2010, 2011) employed the MDS definition of the mean (15.3) to develop bootstrap approaches for one- and two-sample problems. From numerical investigations inference based on the null asymptotic distributions is rarely likely to be reliable; but using the bootstrap typically offers a substantial improvement in performance. For settings in which n is only moderately large, there is often a need to regularize the test statistic used within the bootstrap. Preston and Wood (2010) considered three different adjustments: (i) leaving the first p eigenvalues unchanged, for i = 1, …, p, and replacing the rest with their mean, that is, for i = p + 1, …, d; (ii) leaving the first p eigenvalues unchanged and replacing the rest with the (p + 1)th eigenvalue, that is for i = p + 1, …, d; and (iii) adding the (p + 1)th eigenvalue to each eigenvalue, that is for i = 1, …, d. Preston and Wood (2010) carried out simulation studies which demonstrated that all three regularizations worked well, but (i) was the best in their examples.

If variations are small, then the MDS mean reflection shape will be very similar to that of the full Procrustes mean shape or the Bookstein mean shape. The reason for the similarity of the approaches is that under a perturbation model pre-scaled distances are approximately linear transformations of errors at landmarks. From Example 6.2 we see that the MDS means are less resistent to outliers than the Procrustes and intrinsic mean.

15.4 Euclidean distance matrix analysis for size-and-shape analysis

15.4.1 Mean shape

The classical MDS approach leads to a biased estimate of mean shape or size-and-shape under normal errors. In order to correct for this bias the method of Euclidean distance matrix analysis has been proposed for size-and-shape analysis. An overall summary of this area of work was given by Lele and Richtsmeier (2001).

The method of Euclidean distance matrix analysis (EDMA) (Lele 1991, 1993) also involves the classical MDS solution from an estimated k × k matrix of all inter-landmark distances, but a correction is made for bias under normal models. Modelling assumptions are made in EDMA whereas there were no modelling assumptions in MDS. Euclidean distance matrix analysis is intended for size-and-shape analysis, whereas MDS is appropriate for size-and-shape or shape analysis.

Let F(X) be the form distance matrix, which is the k × k matrix of all pairs of inter-landmark distances in the configuration X. An estimate of the population form distance matrix F(μ) can be obtained using normal models. Stoyan (1990) and Lele (1991, 1993) use inter-landmark distances and then use MDS to estimate reflection size-and-shape. Their method involves estimating population distances using a method of moments under a normality assumption, as in the following.

We shall concentrate on the m = 2 dimensional case here, although calculations for m = 3 dimensions can also be derived (Lele and Richtsmeier 2001). Let (xj, yj) be the coordinates of the jth landmark in two dimensions, distributed independently as:

Then

(15.6) numbered Display Equation

where r = 1, …, ks = 1, …, krs, χ22 denotes a non-central chi-squared distribution with two degrees of freedom, and δ2rs = (μr − μs)2 + (νr − νs)2. We have

so that

Important point: Consider a random sample of squared distances between the two landmarks labelled r and s, for n objects, given by d21, …, dn2. From Equation (15.7) we see that the sample average of the squared distances between landmarks r and s is biased, and the bias is 2σ2.

We can remove the bias by substituting the sample moments into Equation (15.8) to obtain a moment estimate of the population squared Euclidean distance δ2rs,

numbered Display Equation

where . Anisotropy can also be accommodated (Lele 1993). The estimate of mean reflection size-and-shape

numbered Display Equation

is consistent for mean reflection size-and-shape μ under model (15.5) with isotropic multivariate normal errors (Lele, 1993), but is not robust since var(d2) involves fourth moments. Thus, EDMA is closely related to MDS, but the additional modelling assumptions allow any bias to be removed. As with any size-and-shape study, the objects have to be recorded to the same scale. Any rescaling will affect the distributional assumptions and the bias corrections will no longer be appropriate. Stoyan and Stoyan (1994) also discuss similar distance-based methods in some detail, concentrating on the triangle case.

If variations are small, then the EDMA mean reflection shape will be very similar to that of MDS, which in turn is very similar to the full Procrustes mean shape or the Bookstein mean shape. For further details of EDMA methods see Lele and Richtsmeier (2001); Heo and Small (2006) also provide a detailed review.

Recent work on estimating Euclidean distance matrices with applications to protein size-and-shape estimation includes that by Zhang et al. (2016) who use a regularized kernel-based method for consistent estimation.

15.4.2 Tests for shape difference

We consider two tests for shape difference between mean shape or size-and-shape in two independent groups, using distance-based methods.

15.4.2.1 EDMA-I

Lele and Richtsmeier (1991) consider the EDMA-I test for difference in reflection size-and-shape between means in two groups. In order to examine differences in reflection size-and-shape consider the form ratio distance matrix

(15.9) numbered Display Equation

In order to test for mean reflection size-and-shape differences Lele and Richtsmeier (1991) advise the use of the test statistic

(15.10) numbered Display Equation

where and are suitable estimators of mean size-and-shape (such as the EDMA estimator). Bootstrap procedures are used to estimate the null distribution of the test statistic.

Example 15.2 Taylor (1996) used EDMA to investigate the size-and-shape of oxygen masks produced for the US Air Force. A set of 3D coordinates between landmarks were taken on 30 male and 30 female subjects – 16 anatomical landmarks on the face and 14 landmarks on the mask, using a laser scanner. An assessment was made as to how the mask fitted, and the fit was rated as a Pass or Fail. Tests using EDMA were carried out to assess whether or not there were reflection size-and-shape differences between the Passes and Fails. There were no significant differences in facial reflection size-and-shape, but there were significant differences in how the mask was placed on the face in the Passes and Fails. Thus, the use of EDMA provided mask placement specifications for a good fit.

15.4.2.2 EDMA-II

Lele and Cole (1995) propose an alternative, more powerful test, EDMA-II, which is appropriate for testing for different mean reflection shapes or reflection size-and-shapes in two groups. First of all estimates of the average form distance matrices are obtained for each group and . Each entry of an average form distance matrix is then scaled by an overall size measure for the group. The ‘shape difference matrix’ is then defined by the arithmetic difference of the two scaled average form distance matrices. The test statistic is given by the value of that entry in the shape difference matrix which is farthest from zero. Using the normal model and estimated mean size-and-shape and estimated covariance matrices, a parametric Monte Carlo confidence interval for the test statistic is obtained. The null hypothesis would be rejected at the appropriate level if the interval does not contain zero.

Lele and Cole (1995) consider a power study comparing EDMA-II and Hotelling’s T2 test using Bookstein coordinates. The relative power of the two tests depends on the particular mean shape difference in the alternative hypothesis. EDMA-II can be more powerful than Hotelling’s T2 for some alternatives, whereas Hotelling’s T2 can be more powerful for other alternatives. Kent (1995, MORPHMET electronic discussion list) gives insight into this phenomenon. Hotelling’s T2 test can be expressed as a union-intersection test where the test statistic is:

where and Si are the sample means and covariance matrices for the ith group, and is the pooled mean. For small variability a particular difference of scaled Euclidean distances is approximately a linear function of Bookstein coordinates. Hence, EDMA-II involves examining a similar statistic to that of Equation (15.11) except that the maximum is carried out over k(k − 1)/2 fixed directions. If the mean difference in the alternative lies along one of the fixed directions, then EDMA-II will have slightly higher power than Hotelling’s T2. However, if the alternative difference lies away from the fixed directions, then Hotelling’s T2 will have higher power. Other power studies are given by Rohlf (2000).

15.5 Log-distances and multivariate analysis

An alternative approach to shape analysis based on inter-landmark distances has been proposed by Mardia et al. (1996b) and Rao and Suryawanshi (1996), using logarithms of distances. Estimates of mean reflection size-and-shape and reflection shape are obtained using average log-distances and MDS.

Let G(X) be the form log-distance matrix, which is the k × k matrix of all pairs of inter-landmark log-distances in the configuration X. The shape log-distance matrix is:

numbered Display Equation

If di(h1, h2) is the distance between landmarks h1 and h2 for the ith object Xi, i = 1, …, n, then the average form log-distance matrix is:

numbered Display Equation

An average form matrix can be obtained by exponentiating, and then MDS can be used to obtain an estimate of the mean reflection size-and-shape

numbered Display Equation

An average shape log-distance matrix is given by and an average shape can be obtained by exponentiating, and then using MDS, that is the estimate of the mean reflection shape is:

numbered Display Equation

Any arbitrary scalings of individual objects will appear as an overall constant added to each inter-landmark log-distance, and so the same estimate of reflection mean shape will be obtained regardless of the arbitrary scaling of objects.

If variations are small, then the estimate of mean reflection shape using the logarithm of distances will be very similar to using MDS directly without taking logs, which in turn will be very similar to the Procrustes and Bookstein means.

Rao and Suryawanshi (1996) suggested the adaptation of multivariate analysis techniques for inference. The analysis of size and shape through functions of distances is closely related to the traditional multivariate morphometric approach, briefly described in Section 2.1. Consider multivariate procedures on the q = k(k − 1)/2-vector of inter-landmark log-distances

numbered Display Equation

In particular, one example of Rao and Suryawanshi (1996)’s procedures is a two-sample test for mean size and shape difference. If and are the sample means of this inter-landmark log-distance vector in two independent samples, with pooled sample covariance matrix S, then the Mahalanobis distance can be split into two parts:

numbered Display Equation

where H is a (q − 1) × q matrix of rank q − 1 such that H1q = 0 (e.g. the Helmert submatrix). The expression D2sh reflects the difference in reflection shape and D2si reflects the difference in size. The objects will need to be recorded to the same scale for D2si to be meaningful.

15.6 Euclidean shape tensor analysis

Cooper et al. (1995) and Goodall (1995) have introduced Euclidean shape tensor analysis (ESTA) which is a generalization of distance-based methods. One works with the size-and-shape of subsets of points (say 3 or 4 points) rather than just lengths between pairs of points. An average size-and-shape for each subset is obtained by performing Procrustes analysis, and then an overall average size-and-shape for the k point configuration is obtained by a metrical scaling method. Some related distributional results are also given by Dryden et al. (1997), where statistics based on subsets of triangles in nearly regular spatial patterns have been investigated. Dryden et al. (1997, 1999) and Faghihi et al. (1999) developed procedures for detecting abnormalities in muscle fibres using the shapes and sizes of subsets of Delaunay triangles.

15.7 Distance methods versus geometrical methods

There has been a great deal of discussion about the advantages and disadvantages of distance-based methods over geometrical shape methods, such as Procrustes analysis and edge registration. If one is interested in a few specific length measurements on an organism, then it would seem most suitable to use distance-based methods in such applications. However, if one is interested in the complete geometrical configuration, then one must consider the merits of the various methods, always bearing in mind that for small variations distance-based methods and geometrical shape methods give similar results. Distance, area, volume, weight and energy are examples of extensive measurements which are dependent on scale. These are in contrast to intensive measurements, such as angles and proportions. Bookstein (2015a)'s study of self-similarity is appropriate for intensive rather than extensive measurements, as one is interested in phenomena at every scale.

An advantage of distance-based methods is that they can be applied to distances that do not require the location of landmarks, for example the maximum and minimum widths of an organism. Also dealing with missing landmarks is straightforward. Finally the mean size-and-shape can be estimated consistently under normal models, with appropriate adjustments.

A disadvantage with distance-based methods is that they are invariant under reflections (which may not be desirable for certain applications). Also a form difference matrix is difficult to interpret, and visualization of shape variability and shape differences is difficult.

Important point: However, we would like to emphasize that if variations are small, then registration methods and distance-based methods can give similar conclusions about shape (Kent 1994) when the coordinates are approximately linearly related, although there may be practical differences in efficiency.

An important question is: when are variations small? The answer will depend on the curvature of the space, and also on the number of landmarks k and number of dimensions m. If all the data are away from the singularities of the shape space and if all the data lie within dF = 0.2 of an average shape, then the tangent space approximations have seemed fine in many examples and so this provides a useful rule-of-thumb.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset