12
Deformations for size and shape change

12.1 Deformations

12.1.1 Introduction

In biology and other disciplines one is frequently interested in describing the difference in shape or size-and-shape between two or more objects. A measure of distance, such as the Riemannian distance, gives us a numerical measure of size and shape comparison, but this global measure does not indicate locally where the objects differ and the manner of the difference. The biological goal in many studies is to depict or describe the morphological changes in a study.

First of all we consider the case when only two objects are available and we wish to describe the size and shape difference between them. Sometimes there will be a natural ordering of the two objects, for example a juvenile growing into an adult, and sometimes not, for example a male and a female of a species. In order to describe the difference in size and shape we compute the transformation of the space in which the first object lies into the space of the second object. The transformation will give us information about the local and global shape differences.

Definition 12.1 By global differences we mean large scale changes (with small amounts of bending per unit Procrustes distance), such as an overall affine transformation. Local differences are on a smaller scale (with larger amounts of bending per unit Procrustes distance), for example highlighting changes in a few nearby landmarks. Global differences are smooth changes between the figures, whereas local changes are the remainder of the components of a deformation and are less smooth.

The main method for describing deformations will be the thin-plate spline transformation. Following Bookstein (1989, 1991) we shall decompose a thin-plate spline deformation (defined below) into a global affine transformation and a set of local deformations which highlight changes at progressively smaller scales. This decomposition is in close analogy with a Fourier series decomposition, with the constant term in the Fourier series being the global parameter and the coefficients of the trigonometric terms being local parameters at successively smaller scales.

12.1.2 Definition and desirable properties

We consider the case where we have two objects in m ≥ 2 dimensions.

Definition 12.2 Consider two k landmark configuration matrices in , T = (t1, …, tk)T and Y = (y1, …, yk)T both k × m matrices, and we wish to deform T into Y, where . We use the notation that the m-vector tj is written as tj = (tj[1], …, tj[m])T. A deformation is a mapping from to defined by the transformation

numbered Display Equation

Here T is the source and Y is the target. The multivariate function Φ(t) should have certain desirable properties. In particular, we would want as many of the following properties to hold as possible for the deformation:

  1. continuous;
  2. smooth (which implies 1. of course);
  3. bijective (one to one and onto);
  4. not prone to gross distortions (e.g. not folding, which will be guaranteed if 3. holds);
  5. equivariant under relative location, scale and rotation of the objects, that is the transformation is not affected by a common translation, rotation and scaling of the figures;
  6. an interpolant, that is yj = Φ(tj) for j = 1, …, k.

If the interpolation property is not satisfied then we call the deformation a smoother. Note that the deformation is from the whole space to , rather than just from a set of landmarks to another or an outline to another.

Applications of deforming images will be considered in Section 17.4.1.

12.1.3 D’Arcy Thompson’s transformation grids

D’Arcy Thompson (1917) considered deformations from one species to another in order to explain size and shape differences. A regular square grid pattern was drawn on one object and the grid was deformed to lie on the second object, with corresponding biological parts located in the corresponding grid blocks. In Figure 2.1, Figure 12.1, Figure 12.2 and Figure 12.3 we see some examples. These grids are known as Cartesian transformation grids. The transformation grids enable a biologist to describe the size and shape change between the two species, albeit in a rather subjective way. D’Arcy Thompson’s figures were drawn by hand and were not always very accurate, for example in Figure 2.1 the correspondence in the lower and upper fins at (4.5, b) and (4.5, c) is not good (Bookstein, 1994b). There have been many attempts since 1917 to recreate these figures more objectively, as will be discussed in Section 12.4.1.

Image described by caption.

Figure 12.1 An initial square grid placed on a human skull. Source: Thompson 1917. Reproduced with permission of Cambridge University Press.

Image described by caption.

Figure 12.2 Cartesian transformation grids from the human skull in Figure 12.1 to a chimpanzee (a) and a baboon (b). Source: Thompson 1917. Reproduced with permission of Cambridge University Press.

Image described by caption.

Figure 12.3 Cartesian transformation grids from one species of fish to another. The transformation is an affine deformation. Source: Thompson 1917. Reproduced with permission of Cambridge University Press.

12.2 Affine transformations

12.2.1 Exact match

The simplest possible size and shape change is that of an affine transformation, as seen in Figure 12.3. In this case the square grid placed on the first fish is deformed uniformly and affinely into a parallelogram grid on the second fish.

Definition 12.3 The affine transformation from a point to a point is:

where A is m × m, c is m × 1 and A is non-singular. A linear transformation is the affine transformation of Equation (12.1) but with c = 0.

Consider selecting a subset of m + 1 landmarks on each figure, and denote the subsets by Tsub and Ysub, both (m + 1) × m matrices. Writing B = [c, A]T then, since we match the m + 1 points exactly, the unknown parameters B are found from the solution to:

By selecting m + 1 landmarks we obtain a unique solution in general, because there are (m + 1)m unknown parameters in B and (m + 1)m constraints in Equation (12.2). Write X = [1m + 1, Tsub] for the ‘design matrix’. Solving Equation (12.2) for B we have:

numbered Display Equation

assuming (XTX)− 1 exists, which will be true if X is of full rank m + 1. This solution is the same as the least squares estimator when we wish to estimate a transformation from k = m + 1 points, and here the fit is exact and the residuals are all zero.

12.2.2 Least squares matching: Two objects

For k > m + 1 the affine match is not exact in general and so we estimate the affine transformation by minimizing some criterion. Write XD = [1k, T] for the k × (m + 1) design matrix. A general regression model is given later in Equation (18.1). Taking h(T, B) = XDB, where B is an (m + 1) × m matrix, gives an affine match of T to Y. We exclude configurations that do not have full rank, so rank(XD) = m + 1 = rank(Y). Equation (18.1) becomes:

numbered Display Equation

where A is an m × m matrix and c an m-vector for translation.

The general shape space (see Section 18.4) with affine invariance has been studied in detail by Ambartzumian (1982, 1990). Some distributional results for offset normal models were given by Goodall and Mardia (1993). Also Bhavnagri (1995b) and Sparr (1992) have considered the affine shape space which, if configurations of less than full rank are ignored, is the Grassmann manifold of m-planes in (e.g. Harris, 1992; Patrangenaru and Mardia, 2003). A suitable distance in the affine shape space between T and Y is given by:

(12.3) numbered Display Equation

The post-multiplication of Y by (YTY)− 1/2 prevents degenerate solutions and ensures that dA(Y, T) = dA(T, Y).

It is sometimes convenient to use vector notation: writing y = vec(Y), β = vec(B), and ε = vec(E), the linear model is

numbered Display Equation

where ε is taken to be a zero mean random vector and Xd = ImXD is a km × m(m + 1) block diagonal design matrix. The match is obtained by estimating β with , where

numbered Display Equation

where vec− 1(ε) = E is the reverse of vectorizing (see Definition 4.5).

Using least squares matching, the solution is given explicitly and is unique provided XTDXD is of full rank (m + 1). The solution is obtained by differentiating

numbered Display Equation

with respect to β (a discussion of matrix differentiation is given e.g. in Mardia et al. 1979, Appendix A.9). The resulting normal equations are:

numbered Display Equation

and hence the solution is:

numbered Display Equation

where (XTdXd)− 1 is assumed to exist. The matrix representation of the solution is:

numbered Display Equation

The second derivative matrix is 2XTdXd which is strictly positive definite and hence the solution is a minimum.

The residual vector is given by:

numbered Display Equation

where Hhat is the usual ‘hat’ matrix. Note that HhatHhat = Hhat and so Hhat is idempotent. Also, (IkmHhat) is idempotent. The minimized value of the objective function is

numbered Display Equation

It is important to note that matching from T to Y is not the same as matching from Y to T in general. This property is generally true for most registration methods. One way to introduce symmetry into the problem is by obtaining an average object ‘half-way’ between T and Y, and then match T and Y separately into the average object.

The affine case where the residuals are zero (exact matches) was considered earlier in more detail in Section 12.2.1.

Example 12.1 Horgan et al. (1992) consider affine matches of two electrophoretic gels (gel A and gel B) described in Section 1.4.14. In Figure 1.25 we saw two configurations of k = 10 landmarks located on each gel which are taken from different proteins from malaria parasites. The landmarks are called invariant points in this application because they are common to all proteins in this study. The idea is to match the two configurations with an affine transformation. By matching the gels to each other or to a reference figure one can identify the malaria parasite by using the positions of the other (variant) spots in the image. In Figure 12.4 the fitted gel A image has been registered onto the gel B image. New grey levels xNij have been calculated from the grey levels from the fitted gel A () and gel B (xBij), using The figures have been thresholded to four levels for display purposes, indicating which spots were in gel A, gel B or both, and which region was not dark in either gel. The affine fitted gel A image matches quite well with the gel B image, because there are many dark grey common spots in both images. The parameters of the fit are:

numbered Display Equation

and so there is a shift in location to the left and upwards ( − 36.08, 66.64) and approximately a 10% vertical shrinkage and a 3% horizontal shrinkage.

Image described by caption.

Figure 12.4 The superimposed gel A onto gel B using an affine match, using the 10 invariant spots chosen by an expert. The four grey levels indicate the correspondence of dark and light pixels in the registered images. The key used is: black, dark pixel in gel A and a light pixel in gel B; dark grey, matched dark pixels in gels A and B; light grey, matched light pixels in gels A and B; and white, dark pixel in gel B and a light pixel in gel A.

Mardia et al. (2012) have addressed a specific problem for matching gels under affine transformation in which each configuration contains a subset of points (‘markers’) whose labels correspond with high probability, with the remaining points having arbitrary labels (‘non-markers’).

Gordon (1995) provides a matching procedure based on different affine transformations in subsets of points formed by dividing regions of the plane, with an application to face matching.

12.2.3 Least squares matching: Multiple objects

Consider a random sample of configurations T1, …, Tn from a population with mean μ. The design matrices are XDi = [1k, Ti], i = 1, …, n. If least squares is chosen for both the estimation of the mean and the point configuration matching, then we wish to minimize

numbered Display Equation

where βi = vec(Bi) and Xdi = ImXDi, i = 1, …, n. It is clear that, given μ,

numbered Display Equation

Hence,

numbered Display Equation

where Hi is the ‘hat’ matrix for Xdi, that is

numbered Display Equation

Therefore,

numbered Display Equation

subject to the constraints μTμ = Im, where

Let be the k × k matrix such that . Therefore the solution is given by the m eigenvectors of with smallest eigenvalues. This is a well-known result and can be simply seen by repeatedly applying the standard result that the minimum value of

numbered Display Equation

is given by the smallest eigenvalue of AH [e.g. Mardia et al. 1979, Equation (A.9.11)].

Even with a general choice of s(E) if ϕ(E) = ETE in Equation (18.2), then to obtain the new from the we have:

numbered Display Equation

which can lead to a simple implementation.

The above generalized affine matching result was termed the linear spectral theorem by Goodall (1991, p. 335) and was also presented by Rohlf and Slice (1990) and Hastie and Kishon (19911991) and for planar shape analysis using complex notation by Kent (1994).

It is clear that affine matching and statistical inference can be tackled (with care) by existing regression and multivariate analysis procedures.

12.2.4 The triangle case: Bookstein’s hyperbolic shape space

In two dimensions the most general transformation between two triangles is an affine transformation and we study this case in more detail.

Consider two (different) triangles in a plane T = (t1, t2, t3)T and Y = (y1, y2, y3)T, both (3 × 2) matrices, and the ratio of baseline lengths is denoted as γ = ||y2y1||/||t2t1|| and Bookstein’s shape coordinates for each triangle are (θT, ϕT)T and (θY, ϕY)T. Let us exclude all collinear triplets of points (flat triangles) and all shape changes from a triangle with an apex above the baseline to a triangle with an apex below the baseline (we are excluding reflections). Consider the change in shape from triangle T to triangle Y. First of all the triangles are translated, rotated and rescaled to T* = (t*1, t2*, t*3)T and Y* = (y*1, y2*, y*3)T so that they have common baselines to . The transformation between the shapes of the two triangles is described by the affine transformation

where A is a 2 × 2 matrix, c is a 2 × 1 vector and

numbered Display Equation

Solving the six equations in Equation (12.5) we find that

(12.6) numbered Display Equation

If we inscribe a circle of unit diameter in triangle T, this is linearly transformed into an ellipse in triangle Y, and the major and minor axes of the ellipse have lengths δ1 ≥ δ2 > 0 and are called the principal strains. Calculation of δ1 and δ2 is very simple using a singular value decomposition of A

numbered Display Equation

where the columns u1 and u2 (both 2 × 1) of U are the eigenvectors of AAT and the columns v1 and v2 of V are the eigenvectors of ATA. Hence, δ1 and δ2 are the square roots of the eigenvalues of both AAT and ATA. The principal axes in triangle Y are u1 and u2, corresponding to axes v1 and v2 in T. Also,

numbered Display Equation

and so we can see that the principal strains have lengths δ1 ≥ δ2 > 0 after the deformation where they were of unit length before.

The direction of the principal axes and the ratio δ12 are invariant under similarity transformations of the original coordinates and are suitable shape change descriptors (provided there is no reflection).

The quantity δ12 is the ratio of the largest stretch and the greatest compression. Taking the distance between two triangle shapes as log (δ12) we see that this shape change space for triangles is different from the shape space with the Riemannian metric. The distance log (δ12) is approximately δ/h, where δ is the Euclidean distance between the Bookstein coordinates and h is the VB coordinate of the first (or second) triangle, see Figure 12.5.

Image described by surrounding text and caption.

Figure 12.5 Bookstein’s hyperbolic shape space considers the distance between two shapes of triangles A and B as approximately δ/h for small shape differences.

In fact Bookstein’s shape space for triangles with distance log (δ12) is a space of constant negative curvature and is known in geometry as the hyperbolic space or the Poincaré half plane. Further details have been given by Small (1988), Bookstein (1991, Appendix A2), Small and Lewis (1995) and Kendall (1998). Kendall (1998) gives some diffusion results for triangles in this space. The hyperbolic space is somewhat restrictive due to its applicability only to triangles when we have 2D data. Comparisons of other configurations of more than three points could proceed by defining a triangulation of the points and working separately on each triangle (i.e. consider a union of hyperbolic spaces). Such triangulations and finite elements will be considered in Section 12.4.2. Small and Lewis (1995) describe the hyperbolic geometry for comparing the shapes of m + 1 points in m ≥ 2 dimensions, with tetrahedra in m = 3 dimensions being the most useful generalization.

12.3 Pairs of thin-plate splines

In practice when there are more than m + 1 landmarks in m dimensions an affine transformation in general will not fit exactly and so a non-affine shape deformation is required. Various approaches have been suggested, although not all the desirable properties listed in Section 12.1.2 are always satisfied.

We shall mainly concentrate on the important m = 2 dimensional case, with deformations given by the bivariate function

numbered Display Equation

Bookstein (1989) has developed a highly successful approach for deformations using a pair of thin-plate splines for the functions Φ1(t) and Φ2(t). The thin-plate spline is a sensible choice since it minimizes the bending required to take the first form into the second, and it does not suffer from problems with very large bending towards the periphery.

12.3.1 Thin-plate splines

The thin-plate spline is a natural interpolating function for data in two dimensions and plays a similar rôle in m = 2 dimensions to the natural cubic spline for interpolation in the 1D case. The natural cubic spline in one dimension is the unique interpolant g(x) which minimizes the roughness penalty

numbered Display Equation

subject to interpolation at the knots (landmarks). However, there are important differences in our application to shape analysis. The cubic spline need not be monotone and is not usually required to be, whereas the thin-plate spline is often desired to be bijective. A monotone cubic spline is analogous to a bijective thin-plate spline.

We shall see that the thin-plate spline is a natural interpolant in two dimensions because it minimizes the amount of bending in transforming between two configurations, which can also be considered a roughness penalty. For an introduction to natural cubic splines and thin-plate splines see Wahba (1990) and Green and Silverman (1994).

We concentrate on the important m = 2 dimensional case, the theory of which was developed by Duchon (1976) and Meinguet (1979). Consider the (2 × 1) landmarks tj, j = 1, …, k, on the first figure mapped exactly into yi, i = 1, …, k, on the second figure, that is there are 2k interpolation constraints,

and we write Φ(tj) = (Φ1(tj), Φ2(tj))Tj = 1, …, k, for the 2D deformation. Let

numbered Display Equation

so that T and Y are both (k × 2) matrices.

Definition 12.4 A pair of thin-plate splines (PTPS) is given by the bivariate function

where t is (2 × 1), s(t) = (σ(tt1), …, σ(ttk))T, (k × 1) and

(12.9) numbered Display Equation

The 2k + 6 parameters of the mapping are c(2 × 1), A(2 × 2) and W(k × 2). There are 2k interpolation constraints in Equation (12.7), and we introduce six more constraints in order for the bending energy in Equation (12.14) to be defined:

The pair of thin-plate splines which satisfy the constraints of Equation (12.10) are called natural thin-plate splines. Equation (12.7) and Equation (12.10) can be re-written in matrix form

(12.11) numbered Display Equation

where (S)ij = σ(titj) and 1k is the k-vector of ones. The matrix

numbered Display Equation

is non-singular provided S in non-singular. Hence,

numbered Display Equation

say. Writing the partition of Γ− 1 as

numbered Display Equation

where Γ11 is k × k, it follows that

giving the parameter values for the mapping. If S− 1 exists, then we have:

where Q = [1k, T], using for example Rao (1973, p. 39).

Using Equation (12.12) and Equation (12.13) we see that and are generalized least squares estimators, and

numbered Display Equation

Mardia et al. (1991) gave the expressions for the case when S is singular.

Definition 12.5 The k × k matrix Be is called the bending energy matrix where

There are three constraints on the bending energy matrix

numbered Display Equation

and so the rank of the bending energy matrix is k − 3.

It can be proved that the transformation of Equation (12.8) minimizes the total bending energy of all possible interpolating functions mapping from T to Y, where the total bending energy is given by:

A simple proof is given by Kent and Mardia (1994a). The minimized total bending energy is given by:

For the deformation grid we want to have as little bending as possible to provide a simple interpretation, both locally near data points and further away from the data. Hence the minimum bending property of the PTPS is highly suitable for many applications. The thin-plate spline has also proved popular because of the simple analytical solution. The deformation can be generalized into m dimensions when a multiple of m thin-plate splines (MTPS) will be required, and this is studied in Section 12.5.

12.3.2 Transformation grids

Following from the original ideas of Thompson (1917) we can produce similar transformation grids, using a pair of thin-plate splines for the deformation from configuration matrices T to Y. A regular square grid is drawn over the first figure and at each point where two lines on the grid meet ti the corresponding position in the second figure is calculated using a PTPS transformation yi = Φ(ti), i = 1, …, ng, where ng is the number of junctions or crossing points on the grid. The junction points are joined with lines in the same order as in the first figure, to give a deformed grid over the second figure. The PTPS can be used to produce a transformation grid, say from a regular square grid on the first figure to a deformed grid on the second figure. The resulting interpolant produces transformation grids that ‘bend’ as little as possible. We can think of each square in the deformation as being deformed into a quadrilateral (with four shape parameters). The PTPS minimizes the local variation of these small quadrilaterals with respect to their neighbours.

Example 12.2 Consider describing the square to kite transformation which was considered by Bookstein (1989) and Mardia and Goodall (1993). Given k = 4 points in m = 2 dimensions the matrices T and Y are given by:

numbered Display Equation

We have here

numbered Display Equation

where a = σ(√2) = 0.6931 and b = σ(2) = 2.7726. In this case, the bending energy matrix is:

numbered Display Equation

It is found that

numbered Display Equation

and so the PTPS is given by Φ(t) = (Φ1(t), Φ2(t))T, where

Note that in Equation (12.17) there is no change in the t[1] direction. The affine part of the deformation is the identity transformation.

We consider Thompson-like grids for this example, displayed in Figure 12.6. A regular square grid is placed on the first figure and deformed into the curved grid on the kite figure. We see that the top and bottom most points are moved downwards with respect to the other two points. If the regular grid is drawn on the first figure at a different orientation, then the deformed grid does appear to be different, even though the transformation is the same. This effect is seen in Figure 12.6 where both figures have been rotated clockwise by 45° in the second row.

Image described by surrounding text and caption.

Figure 12.6 Transformation grids for the square (left column) to kite (right column). In the second row the same figures as in the first row have been rotated by 45° and the deformed grid does look different, even though the transformation is the same. Source: Adapted from Bookstein 1989.

Image described by caption.

Figure 12.7 A thin-plate spline transformation grid between the control mean shape estimate and the schizophrenia mean shape estimate, from the schizophrenia study. The square grid is placed on the estimated mean control shape (a) and the curved grid is pictured on the estimated mean shape of the schizophrenic patients (b), magnified three times, with arrows drawn from the control mean (red) to patient mean (green).

Example 12.3 In Figure 12.7a we see a square grid drawn on the estimate of mean shape for the Control group in the schizophrenia study of Section 1.4.5. Here there are ng = 40 × 39 = 1560 junctions and there are k = 13 landmarks. In Figure 12.7b we see the schizophrenia mean shape estimate and the grid of new points obtained from the PTPS transformation. It is quite clear that there is a shape change in the centre of the brain, around landmarks 1, 6 and 13.

12.3.3 Thin-plate splines in R

Transformation grids can be drawn using the shapes package in R with the command tpsgrid. So, for example we can produce the grid of Figure 12.7 using the commands:

data(schizophrenia)
control<-procGPA(schizophrenia$x[,,1:14])$mshape
patient<-procGPA(schizophrenia$x[,,15:28])$mshape
patient<-procOPA(control,patient)$Bhat
               #align patient mean to control
tpsgrid(control,patient,opt=2,ngrid=40,mag=3)

Also, we can retain the scale by considering the transformation between the size-and-shape means, by using the option scale=FALSE in procGPA. Note that the options here are opt=2 which plots the grids on both figures (whereas opt=1 just plots the deformed grid on the second figure), ngrid=40 gives the dimension of the grid, and mag=3 exaggerates the deformation by a magnification factor of 3 here.

Example 12.4 Consider again the schizophrenia data from Section 1.4.5 but now the transformation from the patient mean shape to the control mean shape is displayed in Figure 12.8. It is clear that a feature of the grid is that a derivative is approximately zero at the point of major shape change. This feature is known as a crease, see Bookstein (2000, 2014, p. 444). Crease analysis is described in detail by Bookstein (2000) and the method involves finding a magnification, orientation and scaling for a more precise location of the crease. Creases have been hypothesized to be important stable features during evolution (Bookstein, 2002). In Figure 12.8(b) the crease is visually obvious, and a specific test for group differences at the crease is suggested (Bookstein, 2000).

Image described by caption.

Figure 12.8 The thin-plate spline transformation grid from the patient mean (a) to the control mean (b), with arrows drawn from the patient mean (red) to control mean (green). The shape change has been magnified three times.

Image described by caption.

Figure 12.9 A series of grids showing the shape changes in the skull of some sooty mangabey monkeys (read across the rows from left to right): (first row) age stage 1, age stages 1 to 2, 1 to 3; (second row) age stages 1 to 4, 1 to 5.

Example 12.5 Consider describing the growth of the sooty mangabey skull from the dataset in Section 1.4.12, and in particular in relation to 12 landmarks in the midline section of the skull. Five juveniles are taken at increasing dental age (an approximate age of the animal based on the progress in growth of the teeth). A PTPS is used to deform a square grid from age stages 1 to 2, from age stages 1 to 3, from age stages 1 to 4 and from age stages 1 to 5 (Figure 12.9). The series of grids is a useful tool for describing perceived growth. In particular there is a clear extension to the lower left of the face as the age of the monkeys increase. In this case the monkeys are all different and can be assumed to be independent, within the class of sooty mangabeys. In the example the PTPS transformation is useful for describing size-and-shape change.

Definition 12.6 Transforming from one object to another is called warping, particularly when the transformation is from one image to another image.

Further properties

The (single) thin-plate spline can be viewed as a particular case of kriging – a technique for spatial prediction developed in geostatistics; for example, see Cressie (1993) and Kent and Mardia (1994a). Discussion of this link is found in Section 12.5.

The thin-plate spline consists of an affine and a non-affine part. Only the non-affine part contributes to the bending energy and hence two configurations which are affine transformations of each other have zero total bending energy. If k = 3 points are available, then there is only the affine part.

The thin-plate spline is so named because it solves the mechanical problem of determining the minimum energy shape of an infinitely thin stiff metal plate under certain constraints. The PTPS is usually bijective (one to one) for practical situations where the shape change is not far from affine, but it is not guaranteed to be bijective. There is no mathematical restriction to prevent folding, and if the shape change is highly non-affine, then folding may occur.

A further desirable property is that the PTPS is equivariant under location, rotation and scale (but not a general affine deformation ). In addition, as t becomes farther away from the first object the thin-plate spline deformation becomes approximately affine.

Various modifications could be made to the deformation. In particular, one may wish to consider the domain to be finite, D say, rather than . In this case the thin-plate spline can be modified and details can be found in Stone (1988) and Green and Silverman (1994). The finite domain interpolant has important differences from the infinite domain thin-plate spline. In particular, there are usually differences around the edges, with reduced variability in these regions.

Also, instead of second derivatives in the energy integral in Equation (12.15) one could consider rth (r > 2) derivatives, leading to a smoother interpolant (see Section 12.5.3). Such a spline will have 2r − 3 continuous derivatives everywhere but its 2r − 2th derivative has a singularity at each data point. Details of higher order splines can be found in Duchon (1976); Meinguet (1979); Green and Silverman (1994); Kent and Mardia (1994a); and Mardia et al. (1996e) for example. For further details of thin-plate splines refer to Wahba (1990); and Green and Silverman (1994); Kent and Mardia (1994a) and the references therein.

Smoothing splines

Rather than interpolating landmarks exactly one could consider a pair of smoothing thin-plate splines. The pair of smoothing thin-plate splines consists of the bivariate function Φ(t) which minimizes

numbered Display Equation

where λ > 0 is a smoothing parameter, and J(Φ) is defined in Equation (12.15). The term λJ(Φ) can be thought of as a roughness penalty. Using Duchon (1976) (for a single thin-plate spline) we see that the solution has exactly the same form as in Equation (12.8) except that the unknown coefficients are obtained from the matrix equation:

Hence, dealing with smoothing thin-plate splines involves an almost identical algorithm as that required for interpolating splines, except that S is replaced by S + λIk. The smoothing parameter controls how smooth the spline is and the value of λ must be pre-specified. If λ is very large, then the resulting spline resembles an affine function (which is completely smooth) and if λ is very small the smoothing spline is close to the interpolating spline. The smoothing spline is not equivariant under scaling of the configurations.

An alternative to pre-specifying λ is to use cross-validation to estimate λ. Consider the 2D observation yj corresponding to tj as a new observation by omitting it from the dataset. Denote the smoothing thin-plate spline from the rest of the data points as Φj(tj, λ). We can judge how well the smoother predicts the deleted observation by examining the magnitude of

numbered Display Equation

The cross-validation score is an overall measure of the prediction squared error

numbered Display Equation

and so an automatic choice for λ can be obtained by minimizing the cross-validation score CV(λ) over λ. The minimization is often not straightforward, and a simple grid search is often the best approach (Green and Silverman, 1994, p. 30).

12.3.4 Principal and partial warp decompositions

Bookstein (1989, 1991) introduced principal and partial warps to help explain the components of a thin-plate spline deformation. The principal and partial warps are useful for decomposing the thin-plate spline transformations into a series of large scale and small scale components.

Definition 12.7 Consider the PTPS transformation from to , which interpolates the k points T to Y (k × 2) matrices. An eigen-decomposition of the k × k bending energy matrix Be of Equation (12.14) has non-zero eigenvalues λ1λ2 ≤ … ≤ λk − 3 with corresponding eigenvectors γ1, γ2, …, γk − 3. The eigenvectors γ1, γ2, …, γk − 3 are called the principal warp eigenvectors and the eigenvalues are called the bending energies. The functions,

(12.19) numbered Display Equation

are the principal warps, where s(t) = (σ(tt1), …, σ(ttk))T.

Here we have labelled the eigenvalues and eigenvectors in this order (with λ1 as the smallest eigenvalue corresponding to the first principal warp) to follow Bookstein’s labelling of the order of the warps (Bookstein, 1996b). The principal warps do not depend on the second figure Y. The principal warps will be used to construct an orthogonal basis for re-expressing the thin-plate spline transformations. The principal warp deformations are univariate functions of 2D t, and so could be displayed as surfaces above the plane or as contour maps. Alternatively one could plot the transformation grids from t to y = t + (c1Pj(t), c2Pj(t))T for each j, for particular values of c1 and c2. Note that the principal warp eigenvectors are orthonormal.

Definition 12.8 The partial warps are defined as the set of k − 3 bivariate functions Rj(t), j = 1, …, k − 3, where

(12.20) numbered Display Equation

The jth partial warp scores for Y (from T) are defined as:

(12.21) numbered Display Equation

and so there are two scores for each partial warp.

Bookstein (2015a) also considers deflated partial warp scores, each multiplied by the square root of λ1j for 2D and multiplied by λ1j for 3D. Since

numbered Display Equation

we see that the non-affine part of the PTPS transformation can be decomposed into the sum of the partial warps. The jth partial warp corresponds largely to the movement of the landmarks which are the most highly weighted in the jth principal warp. The jth partial warp scores indicate the contribution of the jth principal warp to the deformation from the source T to the target Y, in each of the Cartesian axes.

The partial warps are bivariate functions and can be viewed in a plot as deformations of a square grid placed over T. In particular if t is a point in the plane of the original configuration, then we obtain the grid for the jth partial warp by deforming to y = t + Rj(t). Hence, interpretation of the partial warps can lead to an increased understanding of the deformation. It is also often helpful to display the affine transformation as a deformed grid, giving a complete decomposition of the thin-plate spline deformation.

Let the figure T be scaled to unit size, centred with coordinates (μ1, ν1), …, (μk, νk), and rotated so that ∑kj = 1μjνj = 0. Let α = ∑μ2j, β = ∑νj2 with α + β = 1, and denote

Note that u1 and u2 are orthogonal to the block diagonal matrix diag(Be, Be) (and to T). The affine component is often called the zeroth-order warp. The affine scores (zeroth partial warp scores) are given by:

numbered Display Equation

Note that

numbered Display Equation

and u1 and u2 together form an orthonormal basis of tangent space.

The partial warps represent non-affine aspects of the deformation ordered according to the amount of bending that is required to move points in the first figure. In practice this results in the partial warps representing the deformation at various scales – the larger values of the eigenvalues (i.e. larger j) represent small scale local deformation and the smaller eigenvalues (smaller j) represent large scale global deformation. Hence, the first principal and partial warps will be associated with an overall large scale bending of the figure and the last principal and partial warps will usually be associated with the landmarks that are closest together, at the smallest scale.

An analogy is with a Fourier series representation of a function. An orthonormal trigonometric basis is chosen which represents features at a variety of successively finer scales, for example cos x, sin x, cos 2x, sin 2x, cos 3x, sin 3x, ….

We take the inverse bending energy matrix as the Moore–Penrose inverse, given in Definition 9.1:

(12.24) numbered Display Equation

A summary of the different types of warps is given in Table 12.1. The principal warps are detemined by one configuration only, the partial warps from a transformation between two configurations, and the relative warps from a sample of n configurations (described in Section 12.3.6).

Table 12.1 Types of warps and notation.

Type Notation Subscripts Equation
Principal warp Pj(t) j = 1, …, k − 3 (12.19)
Partial warp Rj(t) j = 1, …, k − 3 (12.20)
Partial warp score (pj1, pj2) j = 1, …, k − 3 (12.21)
Relative warp fj j = 1, …, 2k − 6 (12.25)
Relative warp score aij i = 1, …, n; j = 1, …, 2k − 6 (12.25)
Image described by surrounding text and caption.

Figure 12.10 The pair of thin-plate splines deformation from a non-symmetric T configuration of k = 5 landmarks (a) deformed into a second figure (b), and the principal warps for the non-symmetric T example drawn as surfaces above the Cartesian plane: (c) the first principal warp; and (d) the second principal warp. Source: Adapted from Bookstein 1989.

Example 12.6 Bookstein (1991, Section 7.5.2) considers a non-symmetric T configuration (k = 5 points) deformed into a second figure (Figure 12.10a,b). It is found that the two eigenvectors of the bending energy matrix are:

numbered Display Equation

with corresponding eigenvalues λ1 = 0.296 and λ2 = 0.568. Note that eigenvalues of Bookstein (1991) differ by a factor of 2 because he uses σ(||h||) = ||h||2log (||h||2) = 2||h||2log (||h||) for the covariance function. We can inspect the loadings of each eigenvector to see which coefficients are large, and hence interpret the effect. Alternatively we can plot the principal warps as surfaces and in Figure 12.10c,d we see the two principal warps. The first principal warp corresponds to a relatively large feature – the difference in landmarks 4 and 5 relative to 1, 2 and 3. The second (and last) principal warp reflects the relatively small feature differences in the displacements of the two nearest points, landmarks 4 and 5. In Figure 12.11 we see the deformation grids for the partial warps and the affine part. The first (larger scale change) involves landmarks 4 and 5 moving down relative to points 1, 2 and 3. The second partial warp is largely composed of the relative movement of landmarks 4 and 5 together (smaller scale change). Also in Figure 12.11 we see the affine component, which is a shear of the left region upwards and the right region downwards.

Image described by surrounding text and caption.

Figure 12.11 (a) The initial grid on the non-symmetric T, (b) the first partial warp, (c) the second partial warp, and (d) the affine component. Source: Adapted from Bookstein 1989.

Image described by caption.

Figure 12.12 A thin-plate spline transformation grid between a female and a male gorilla skull midline. (a) The square grid is placed on the estimated mean female skull and (b) the curved grid is pictured on the estimated mean male skull. The deformation is magnified three times here, for ease of interpretation.

Image described by caption.

Figure 12.13 The five principal warps for the the pooled mean shape of the gorillas. The top row shows the first three principal warps (first on the left to third on the right) and the bottom row shows the last two principal warps (fourth on the left, fifth on the right).

Image described by caption.

Figure 12.14 The five principal warps for the pooled mean shape of the gorillas. The top row shows the first three principal warps (first on the left to third on the right) and the bottom row shows the last two principal warps (fourth on the left, fifth on the right). The principal warps are pictured by deforming the full Procrustes mean shape for the pooled dataset of all 59 gorillas to figures which have a partial warp score of 0.15 along each warp.

Example 12.7 Consider describing the shape change from an estimate of the mean size-and-shape female gorilla skull midline to an estimate of the mean size-and-shape of the male gorilla skull midline, for the data described in Section 1.4.8. There are k = 8 landmarks in the plane. The thin-plate spline deformation is given in Figure 12.12. Although interpretation is far from straightforward, a possible explanation is that the main size-and-shape difference is a greater protrusion in the male face compared with the female face – there is less size-and-shape difference at the back of the skull.

There are five principal warps from the pooled Procrustes mean, shown as surfaces in Figure 12.13. It is clear that the first few principal warps are smoother and spread over a larger scale than the less smooth, smaller scale, last few principal warps. The principal warps are also displayed as deformation grids in Figure 12.14. We can see the larger scale deformation of the first principal warp which includes the face (left half) and braincase (right half) moving downwards relative to the middle of the skull. The face is also stretched outwards. The second principal warp includes the relative shrinking in height of the braincase, and the bottom left three landmarks moving away from the rest of the skull. The principal warps are concerned with successively smaller scale deformations, and at the smaller scales the fourth principal warp involves the movement in ba and o and the final principal warp includes a contrast between left-most landmarks pr and na and the rest of the skull. The landmark labels were given in Figure 1.18. If using either of the individual male or female group means rather than the pooled mean, then the plots appear virtually identical to the eye.

The five partial warp deformation grids for the female mean shape to the male mean shape are shown in Figure 12.15 together with the affine grid (all magnified three times). We see that the face region is pulled up and outwards in partial warp 1, and the face is taller in height in partial warp 2. There is very little contribution in partial warp 3. In partial warp 4 landmarks ba and o become closer together and in partial warp 5 landmarks pr and na become closer together.

The affine and partial warp scores for the gorilla skull data are displayed in Figure 12.16. For each partial warp there are two scores (one for each Cartesian axis x, y). For each warp we plot the pairs of scores for the y-axis against the x-axis, labelling the males and females with ‘m’ and ‘f’, respectively. We can see that there are fairly good separations of the males and females in the affine scores and the partial warp scores, except for partial warp 3. The percentage of pooled variability explained by the affine component is 26.1%, and for the partial warps 1 to 5 the percentages explained

are 32.7, 17.8, 12.3, 6.5 and 4.4%, respectively. Note that the variance of the partial warps drops with warp number as in Bookstein (2015a). The shape variability is not spread evenly over the warps. So, the variability in the scores is not spherical – there is much more shape variability in the first few partial warp scores and the smaller scale warps summarize only a small fraction of the shape variability.

Note that the principal warps and partial warp scores can be obtained in R using procGPA with the option alpha=1. In particular, the output is in $principalwarps, $partialwarpscores and the percentage contributions are in $partialwarp.percent. For example, for the pooled gorilla data:

ans<-procGPA( abind(gorf.dat,gorm.dat), alpha=1)
> ans$partialwarps.percent
[1] 32.719875 17.787477 12.304224 6.506736 4.443423

The affine components, scores and percentage contributions are obtained using option affine=TRUE and are given in $pcar, $scores, $percent respectively. For the pooled gorilla data we have:

ans<-procGPA( abind(gorf.dat,gorm.dat), alpha=1, affine=TRUE)
> ans$percent
[1] 17.529125 8.565122

12.3.5 PCA with non-Euclidean metrics

Principal component analysis of a random sample of shapes can be carried out with respect to the bending energy matrix or its generalized inverse, emphasizing large or small scale variations.

Definition 12.9 Define the pseudo-metric space as the real p-vectors with pseudo-metric given by:

numbered Display Equation

where x1 and x2 are p-vectors, and A is a generalized inverse (or inverse if it exists) of the positive semi-definite matrix A.

The Moore–Penrose inverse is a suitable choice of generalized inverse (see Definition 9.1). If A is the population covariance matrix of x1 and x2, then dA(x1, x2) is the Mahalanobis distance. The norm of a vector x in the metric space is

numbered Display Equation

We could carry out statistical inference in the metric space rather than in the usual Euclidean (A = I) space (Mardia 1977, 1995; Bookstein 1995, 1996b; Kent, 1995, personal communication). A simple way to proceed is to transform from to y inline (A)1/2x in Euclidean space. For example, consider PCA of n centred p-vectors x1, …, xn in the metric space. Transforming to yi = (A)1/2xi the PC loadings are the eigenvectors of

numbered Display Equation

Denote the eigenvectors (p-vectors) of Sy as γyj, j = 1, …, p (assuming pn − 1), with corresponding eigenvalues λyj, j = 1, …, p. The PC scores for the jth PC on the ith individual are:

numbered Display Equation

So, the (unnormalized) PC loadings on the original data are (A)1/2γyj which are the eigenvectors of (A)1/2Sx(A)1/2, where , using standard linear algebra, see Mardia et al. (1979, Appendix). The first few PCs in the metric space (with loadings given by the eigenvectors of (A)1/2Sx(A)1/2) can be useful for interpretation, emphasizing a different aspect of the sample variability than the usual PCA in Euclidean space. If our analysis is carried out in the pseudo-metric space, then we say the our analysis has been carried out with respect to A.

For example, in canonical variate analysis in multivariate analysis observations from several groups are available. An eigendecomposition of the between group covariance matrix Sbetween is carried out with respect to Swithin, the within group covariance matrix (e.g. see Mardia et al. 1979, p. 338). Another form of weighted PCA was carried out by Polly et al. (2013) who consider PCA weighted by phylogenetic distance.

Potentially very useful metrics for shape analysis are the bending energy and inverse bending energy metrics in the tangent space. All metrics here depend on quadratic forms, and the procedures use Mahalanobis distances. A very different approach could be obtained by using, say, the L1 norm

numbered Display Equation

instead of the Euclidean L2 norm

numbered Display Equation

12.3.6 Relative warps

If a random sample of shapes is available, then one may wish to examine the structure of the within group variability in the tangent space to shape space. This problem was approached in Section 7.7 using PCA with respect to the Euclidean metric, but an alternative is the method of relative warps. Relative warps are PCs with respect to the bending energy or inverse bending energy metrics in the shape tangent space. Bookstein (2015a) also considered deflated relative instrinsic warps which involves deflating partial warp scores.

Consider a random sample of n shapes represented by Procrustes tangent coordinates v1, …, vn (each is a 2k − 2-vector), where the pole μ is chosen to be an average pre-shape such as from the full Procrustes mean, see Section 8.3. The sample covariance matrix in the tangent plane is denoted by Sv and the sample covariance matrix of the centred tangent coordinates xi = (I2HT)vi,  i = 1, …, n is denoted by Sc (2k × 2k). In our examples we have used the covariance matrix of the Procrustes fit coordinates of Section 7.7.4. The bending energy matrix is calculated for the average shape Be and then the tensor product is taken to give B2 = I2Be, which is a 2k × 2k matrix of rank 2k − 6. We write B2 for a generalized inverse of B2 (e.g. the Moore–Penrose generalized inverse of Definition 9.1).

We consider PCA in the tangent space with respect to a power of the bending energy matrix, in particular with respect to Bαe.

Definition 12.10 Let the non-zero eigenvalues of (B2)α/2Sc(B2)α/2 be l1, …, l2k − 6 with corresponding eigenvectors f1, …, f2k − 6 and

numbered Display Equation

with λ1, …, λ2k − 6 the eigenvalues of B2 with corresponding eigenvectors γ1, …, γ2k − 6. The eigenvectors f1, …, f2k − 6 are called the relative warps. The relative warp scores are:

(12.25) numbered Display Equation

Important point: The relative warps and the relative warp scores are useful tools for describing the non-affine shape variation in a dataset. In particular the effect of the jth relative warp can be viewed by plotting

numbered Display Equation

for various values of c, where

numbered Display Equation

The procedure for PCA with respect to the bending energy requires α = +1 and emphasizes large scale variability. Principal component analysis with respect to the inverse bending energy requires α = −1 and emphasizes small scale variability. If α = 0, then we take B02 = I2k as the 2k × 2k identity matrix and the procedure is exactly the same as PCA of the Procrustes tangent coordinates; see Section 7.7. Bookstein (1996b) has called the α = 0 case PCA with respect to the Procrustes metric. Bookstein (2014) argues that isotropic covariance structure is unrealistic for most applications, and that covariance matrices based on bending energy are often preferable.

Affine variation can also be considered. Let the Procrustes average shape be scaled to unit size and rotated with the principal axes horizontal and vertical, and let u1 and u2 be constructed from the mean shape as in Equation (12.22) and Equation (12.23). The vectors u1 and u2 are orthogonal to the bending energy matrix. Write U = (u1, u2) for the 2k × 2 matrix. In order to view the affine contribution to variability we plot

numbered Display Equation

for various values of c.

Example 12.8 We consider a relative warp analysis of the male and female gorilla data, continuing from Example 12.7. We saw that the affine component explained 26.1% of the total pooled variability. We calculate Be using the overall full Procrustes mean from the pooled data. Carrying out a relative warp analysis with respect to Be (α = 1) we have the ratio of eigenvalues of the first two relative warps to the sum of the eigenvalues for all the relative warps as 63.4 and 19.6%, respectively. A plot of the first two relative warp scores is given in Figure 12.17a, and there is good separation between the males and females. The effect of the first two relative warps is demonstrated in Figure 12.18 by placing a grid on the pooled mean shape and deforming to a shape along each of the relative warps. We see that the relative warps match quite closely to the first two principal warps (seen in Figure 12.13), emphasizing the large scale variability in the data.

Carrying out a relative warp analysis with respect to Be (α = −1) we have the ratio of eigenvalues of the first two relative warps to the sum of the eigenvalues for all the relative warps as 34.8, and 17.4%, respectively. A plot of the first two relative warp scores is given in Figure 12.17b, and again there is good separation between the males and females. The effect of the first two relative warps is demonstrated in Figure 12.19 and we see again that the first relative warp matches quite closely to the first principal warp, although some small scale movement of landmarks na and pr and movement of landmarks ba and o is emphasized. The second relative warp has some similarities with the second principal warp although there is a different emphasis in the braincase region (l is pushed farther leftwards).

Finally we carry out a relative warp analysis with respect to the identity matrix (α = 0), which is the same as PCA of the full Procrustes tangent coordinates of the pooled data. We have the percentages of shape variability explained by the first two relative warps (PCs) as 45.9 and 16.9%, respectively. A plot of the first two relative warp scores is given in Figure 12.17c, and again there is good separation between the males and females. The effect of the first two relative warps is demonstrated in Figure 12.20 and we see again that the relative warps match quite closely to the first and second principal warps.

Image described by caption.

Figure 12.15 The affine component and the partial warps for the deformation from the female to the male gorilla (all magnified three times). The top row shows the affine component and the first two partial warps (affine on the left, first in the middle and second on the right) and the bottom row shows the last three partial warps (third on the left, fourth in the middle and fifth on the right).

Image described by caption.

Figure 12.16 The affine scores and the partial warp scores for female (f) and male (m) gorilla skulls. The top row shows the affine scores and the first two partial warp scores (affine on the left, first in the middle and second on the right) and the bottom row shows the last three partial warp scores (third on the left, fourth in the middle and fifth on the right).

Image described by surrounding text and caption.

Figure 12.17 A plot of the first two relative warp scores with respect to (a) bending energy (α = 1), (b) inverse bending energy (α = − 1) and (c) Procrustes metric (α = 0) for the female (f) and male (m) gorilla skulls.

Image described by surrounding text and caption.

Figure 12.18 Deformation grids for the first two relative warps from the gorilla data with respect to bending energy (α = 1), emphasizing large scale variation. A square grid is drawn onto the pooled mean shape (not shown) and then deformed (by PTPS) to a shape at +6 standard deviations along the first relative warp (a) and −6 standard deviations along the second relative warp (b).

Image described by surrounding text and caption.

Figure 12.19 Deformation grids for the first two relative warps with respect to the inverse bending energy (α = − 1), emphasizing the small scale variation. A square grid is drawn onto the pooled mean shape (not shown) and then deformed (by PTPS) to shapes at 6 standard deviations along the relative warps [first in (a), second in (b)].

In R we can carry out relative warp analysis using the command procGPA with option alpha=1 for relative warp scores with respect to bending energy, with option alpha=-1 with respect to inverse bending energy and with option alpha=0 for the Procrustes tangent space PCA:

s1<-procGPA( apes$x[,,1:59] , alpha=1)$scores
s2<-procGPA( apes$x[,,1:59] , alpha=-1)$scores
s3<-procGPA( apes$x[,,1:59] , alpha=0)$scores
par(mfrow=c(1,3))
#relative warps with respect to Be
plot(s1[,1],s1[,2],type="n",xlab="RW1",ylab="RW2")
text(s1[,1],s1[,2],c(rep("f",times=30),rep("m",times=29)))
title(sub="(a)")
#relative warps with respect to inverse Be
plot(s2[,1],s2[,2],type="n",xlab="RW1",ylab="RW2")
text(s2[,1],s2[,2],c(rep("f",times=30),rep("m",times=29)))
title(sub="(b)")
#relative warps with respect to inverse Be
plot(-s3[,1],-s3[,2],type="n",xlab="RW1",ylab="RW2")
text(-s3[,1],-s3[,2],c(rep("f",times=30),rep("m",times=29)))
title(sub="(c)")
Image described by surrounding text and caption.

Figure 12.20 Deformation grids for the first two relative warp scores with respect to the identity matrix (α = 0), for the gorilla data. A square grid is drawn onto the pooled mean shape (not shown) and then deformed (by PTPS) to a shape at +6 standard deviations along the first relative warp (a) and −6 standard deviations along the second relative warp (b).

Important point: We may ask: which of the choices of α leads to the most informative analysis? The answer depends on the particular application. If large scale effects are the most important, then α = 1 would be a good choice. If small scale effects are the most important, then α = −1 is a good choice. Otherwise, α = 0 would be an appropriate choice.

In the example of the gorilla data, large scale variation seems more important so it could be argued that α = 1 and α = 0 are more useful than α = −1.

12.4 Alternative approaches and history

12.4.1 Early transformation grids

Similar grids to the Cartesian transformation grids of D’Arcy Thompson were popular with Renaissance artists in the 16th century. For example, grids were drawn on a figure such as a skull and then deformed into a new skull by painting corresponding parts in the appropriate part of the deformed grid. A famous example is in the painting ‘The Ambassadors’ by Hans Holbein the Younger (1533, National Gallery, London). Such a deformation is known as ‘anamorphosis’, and in ‘The Ambassadors’ a distorted skull lies diagonally across the bottom of the picture. Other examples included the drawing of grids on different objects to compare them and particularly good examples were given by Dürer (1528) (Figure 12.21) for comparing human form (Bookstein, 1996b).

Image described by surrounding text and caption.

Figure 12.21 Early transformation grids of human profiles. Source: Bookstein 1996b. Reproduced with permission from Springer Science+Business Media.

Medawar (1944) approached the problem mathematically after D’Arcy Thompson’s original ideas. Using the motivation of modelling the shape of humans, Medawar (1944) modelled the dataset known as ‘Stratz’s plane figures’ which have been commented on by Jackson (1915) (Figure 12.22). The data are the frontal elevations of the male human being, at six stages through life. Medawar used quadratic functions for the deformation function Φ2(t) which only depends on height, and coefficients dependent on the age of the individual. The deformation function Φ1(t) is not modelled explicitly. Here t = (t[1], t[2])T is again a point on the first figure.

Image described by surrounding text and caption.

Figure 12.22 Early transformation grids modelling six stages through life. Source: Medawar 1944.

Medawar (1945) also considered explicit analytic modelling of both Φ1(t) and Φ2(t) in transforming an outline of a distorted spleen culture.

Sneath (1967) fitted cubic functions for Φ1(t) and Φ2(t) after initially centring, rotating and rescaling the two figures by a procedure which is identical to ordinary Procrustes analysis (see Section 7.2). One of the problems with this technique is that it allows for considerable bending away from the centre of the forms and results in gross distortions of the grid towards the periphery. See Figure 12.23 for an example, where undesirable folding of the grid can be seen.

Image described by surrounding text and caption.

Figure 12.23 Transformation grids for four skulls. Source: Sneath 1967. Reproduced with permission from John Wiley & Sons.

Image described by surrounding text and caption.

Figure 12.24 Transformation grids for a pair of skulls. Source: Cheverud & Richtsmeier 1986. Reproduced with permission of Oxford University Press.

Cheverud and Richtsmeier (1986) also obtain transformation grids but use the finite element scaling method. Their work is in three dimensions but includes sections of grids in two dimensions. Again the bending of the grids is very large in localized regions, see Figure 12.24 for an example.

12.4.2 Finite element analysis

If two figures are available (e.g. the average shape in each of two groups), then another possible method of shape change description is to consider a finite element description of the size and shape difference. Each figure is represented by a collection of elements or building blocks, for example triangles and quadrilaterals in two dimensions, or tetrahedrons and cuboids in three dimensions. A separate transformation is considered in each element and the approach usually leads to interpolants which are continuous but not smooth across element boundaries. The essence of the finite element approach is that a shape difference can be described in terms of the directions and magnitudes of the principal strains (see Section 12.2.4 for triangles in two dimensions) in the transformation of elements in one form to another. Homogeneous finite element methods (e.g. Bookstein, 1978; Moss et al. 1987) involve the assumption that shape differences (strains) are uniformly distributed throughout each element. Elements may contain different numbers of landmarks but the simplest possible are triangles in 2D figures with apices as homologous landmarks between two forms. In this case shape differences are necessarily affine between two triangles, as described in Section 12.2.4. Other types of finite element methods for shape analysis use non-homogeneous finite elements. Note that with all finite element analyses interpretation of shape differences will depend on element design (Zienkiewicz, 1971). Finite element scaling analysis (e.g. Lew and Lewis, 1977) is non-homogeneous and has been applied in studies of craniofacial growth and sexual dimorphism in three dimensions (Cheverud et al. 1983; Richstmeier, 1986; Cheverud and Richtsmeier, 1986; Richstmeier, 1989). An alternative is to use the design of elements that arise from a Delaunay triangulation (O’Higgins and Dryden, 1993; Kent and Mardia, 1994b; Okabe et al. 2000).

Image described by surrounding text and caption.

Figure 12.25 One finite element description for the shape change from the average female (shown) to the average male gorilla. The crosses represent the principal axes of the deformation in each triangle – the magnitudes of the largest and least amount of stretching are given, together with the directions of the strains. Source: O'Higgins & Dryden 1993. Reproduced with permission from Elsevier.

Recent work has provided a bridge between geometric morphometrics and finite element analysis, via the use of large scale relative warps (Bookstein, 2013a).

Example 12.9 Consider the size-and-shape difference in the midline between female and male gorilla crania, described in Section 1.4.8. First of all the average female and the average male are obtained. We saw in Section 9.4 that the mean shape difference is statistically significant.

In Figure 12.25 we see the average female drawn with finite elements indicating how the triangles change size and shape into the average male form. The crosses on the diagram are the principal strains (see Section 12.2.4) indicating how the unit length axes of a circle change into the major and minor axes of the deformed circle (ellipse).

There is not much shape difference in the back of the skull (b, ba, l) with smaller sum of strains here. However, there is more of a change in the face region (pr, na, st, n, ba) where the front of the face is more protruded in the male than in the female. The PTPS deformation for the same example is given in Figure 12.12.

12.4.3 Biorthogonal grids

Biorthogonal grids were introduced by Bookstein (1978) and give numerical information as to the directions and magnitudes of shape change. These grids give a map of infinitesimal strains at each point in the interior of the first figure, as it is deformed into the second. The biorthogonal grid shows the maximum and minimum amounts of the size-and-shape difference and can be viewed as the first derivative of the interpolation. The choice of interpolant will be crucial to our perception of shape change. At each point in the interior of the figure there are two differentials which are perpendicular before and after the transformation (principal strains). The curves following the directions of the differentials form a grid whose intersections are at 90° in both figures, and these grids are called biorthogonal grids. Applications of the biorthogonal grid have been made to a wide variety of situations (e.g. Bookstein, 1978, 1986), including describing the shape difference in the faces of children with and without fetal alcohol syndrome (Bookstein and Sampson, 1990). Sampson et al. (1991) have also used the biorthogonal grid but in a different setting in environmental statistics. They use the biorthogonal grid as an explanatory tool in the description of spatial covariance structures and their routine for displaying the grids uses the thickness of the lines to indicate the strength of the principal strains.

12.5 Kriging

12.5.1 Universal kriging

To obtain deformations we could consider kriging, a commonly used method of prediction in spatial statistics. We initially consider prediction from a univariate random process at an m-dimensional site t, given k univariate observations taken at sites . Kriging involves the construction of the unbiased linear predictor which minimizes the mean square prediction error

numbered Display Equation

Let g(t) = (g1(t), …, gp(t))T be a vector of known functions. Consider the general linear model

numbered Display Equation

where β = (β1, …, βp)T and ε(t) is a random field. If ε(t) is a zero mean stationary random field, with positive definite covariance function

numbered Display Equation

with σ(0) < ∞, then the prediction is called universal kriging. If βTg(t) = μ is constant, then the prediction is called ordinary kriging. We consider universal kriging and more complete details can be found, for example, in Cressie (1993, p. 151). We follow the treatment of Mardia and Marshall (1984) and Mardia and Little (1994).

The term βTg(t) is called the drift of the underlying trend. Writing Ysite = (Y(t1), …, Y(tk))T as the random vector of the process at the k sites and as a realization of Ysite, then the regression predictor is the conditional mean of Y(t) given Ysite, namely

where (S)ij = σ(titj), s(t) = (σ(tt1), …, σ(ttk))T and the jth row of D is g(tj)T. Under the assumption of a Gaussian random field Equation (12.26) is well known (e.g. Mardia et al. 1979, p. 63). The universal kriging predictor (Mardia and Marshall, 1984) is equivalent to replacing β in Equation (12.27) with the generalized least squares predictor

numbered Display Equation

Definition 12.11 The universal kriging predictor at site t is:

numbered Display Equation

where

numbered Display Equation

There is an alternative representation where the predictor is regarded as a (non-linear) function of t, that is

numbered Display Equation

where

In matrix form we can write Equation (12.28) as:

(12.29) numbered Display Equation

and these equations are known as the dual-kriging equations. When written in this alternative form it is quite straightforward to see the link between splines and kriging predictors. We make the full link in the discussion of generalized kriging in Section 12.5.3. There is also a close connection with the radial basis functions (Arad et al. 1994; Mardia et al. 1996e). Radial basis functions in the numerical analysis literature (Powell, 1987) are approriate to use as isotropic covariance functions.

12.5.2 Deformations

In our applications we are interested in deformations from . Hence we are interested in predicting the multivariate process Y(t) = (Y(t)[1], …, Y(t)[m])T using a collection of m independent kriging predictors. So, if the k observations in the m dimensions at the sites T = (t1, …, tk)T are written as a k × m matrix Yobs, then it follows that the universal kriging predictors can be written as:

where C is m × p and W is k × m, satisfying

Therefore, kriging can be used for deformations for a given positive definite covariance function σ(t). The same deformations using Equation (12.30) are given in the generalized kriging framework which follows, where the class of functions for σ(t) is widened. The kriging deformations were first introduced in Mardia et al. (1991).

12.5.3 Intrinsic kriging

In Section 12.5.1 the positive definite covariance function must exist. However, the technique of intrinsic or generalized kriging involves prediction from the model Y(t) = βTg(t) + ε(t), where ε(t) is an intrinsic random field, which we now define. Intrinsic processes were first developed by Matheron (1973).

Definition 12.12 Let G be a space of known functions. Given sites t1, …, tk, we call

numbered Display Equation

an increment with respect to G if

numbered Display Equation

An intrinsic random field is a stochastic process Y(t) which has, at increments,aiY(ti) distributed with zero mean and variance

numbered Display Equation

The function σ*(ti, tj) is called a conditional positive definite (c.p.d.) covariance function with respect to G.

In time series analysis taking differences annihilates a mean trend. Likewise taking increments annihilates trends or drifts expressed as functions in G. So an intrinsic random field corresponds to an equivalence class of stochastic processes which will have different drifts expressed as functions in G.

The linear predictor that minimizes mean square prediction at site t is called the intrinsic or generalized kriging predictor, and is also given by Equation (12.30), but with σ(t) a c.p.d. covariance function and a basis for G given by g(t) = (g1(t), …, gp(t))T. Technical details can be found in Mardia et al. (1996e).

Consider the important class of intrinsic random processes with c.p.d. covariance function σ*(t, t + h) = σα(h) indexed by α > 0,

(12.32) numbered Display Equation

for . These c.p.d. covariance functions are of interest because they are self-similar, that is σα(ch) ≡ cσα(h) for c > 0 where ‘≡’ means equal up to an even polynomial in h of degree 2[α], where [ · ] is the ‘integer part’ function. Thus, σα(h) and σα(ch) yield the same predictions. Let the space of functions Gr be the space of polynomials in h of degree ≤ r. It is well known that σα(h) defines a c.p.d. function with respect to Gr provided r ≥ [α] (e.g. Kent and Mardia, 1994a). The thin-plate spline for is exactly the same as the intrinsic kriging predictor using the self-similar random field with α = 1, σ(0) = 0 and c.p.d. covariance function σ1(h) taken with respect to G1, the linear functions. Full details are given in Kent and Mardia (1994a). For we will use α = 1/2 and σ1/2(h) = −||h|| taken with respect to G1.

Bookstein (2015a) explores self-similarity in the analysis of morphometric data, including an application to the rat growth data of Section 1.4.11.

The link between kriging and splines has been noted by many authors and discussed in some detail initially by Kimeldorf and Wahba (1971) (e.g. see also Matheron, 1981, p. 180, Cressie 1993; Kent and Mardia 1994a). However, the aims of the two approaches are quite different: for kriging the aim is to provide a linear predictor of Y(t) using the available observations y1, …, yk, whereas for splines the aim is to predict Y(t) with a smooth function of the site location t.

Comparing Equation (12.31) and Equation (12.18) we see that the intrinsic kriging predictor is the same as a smoothing spline with a c.p.d. covariance function equal to σ(t), with smoothing parameter

numbered Display Equation

If λ = 0, then the kriging predictor is an interpolant [i.e. ] and this corresponds to an interpolating spline.

Example 12.10 Consider describing the main aspect of shape variability in the macaque data of Section 1.4.3, with k = 24 landmarks. We consider a kriging transformation from the full Procrustes mean male macaque to an icon which is three standard deviations away from the mean along the first PC. The transformation will demonstrate which parts of the skull are varying the most. The data are m = 3 dimensional and the c.p.d. covariance function is taken as σ1/2(h) = −||h||. A regular cuboid grid is evaluated on the mean male and then deformed using the kriging transformation of the mean male to the icon. In Figure 12.26 we see four different views of the grids on the deformed male at three standard deviations along the first PC. The 3D representation is best viewed with an interactive spinning plot. However, we see from the static plots that the main shape variability in this PC is in the top of the braincase moving relative to the bottom of the braincase.

Image described by caption.

Figure 12.26 The deformed cuboid grid from the full Procrustes mean male to an icon which is three standard deviations along the first PC of shape variability. We see four different views of the icon in 3D space. In particular, view (c) is the top view of the skull and view (d) is the side view. We can see that there is more deformation in the grid at the back and top of the skull, showing the region of greatest shape variability. The deformed grid has been calculated using intrinsic kriging.

12.5.4 Kriging with derivative constraints

Bookstein and Green (1993) introduced the term ‘edgel’ to mean the first derivative information at a landmark. They extended the interpolating thin-plate spline for 2D deformations to include derivative constraints. As well as matching landmarks from one figure to the second, derivative information such as the tangent directions to the outline at a landmark were included. This approach allows higher order information to be included in the deformation. Mardia and Little (1994) and Mardia et al. (1996e) consider kriging with derivative constraints in a more general framework. The method can be extended to kriging with up to pth-order derivative constraints, provided the 2pth-order derivatives of the c.p.d. covariance function exist. Details are given in Mardia et al. (1996e). In particular, one may be interested in a deformation where there are landmark, tangent direction and curvature constraints. Note that away from the objects some undesirable bending or folding can occur. Mardia et al. (2004) have given a model-based approach for an edgel at a triangle shape whereas Mardia (2013a) has given a characterization for two edgels at two landmarks in three dimensions.

12.5.5 Smoothed matching

In some applications the data may be recorded imprecisely, and there may be quite a large amount of noise. For example, Hastie and Kishon (19911991) considered affine matching of handwritten signatures where there was a great deal of added measurement noise. It can be desirable to impose some smoothing into the analysis, in order to obtain smooth estimates of shape and shape variability.

A suitable approach to the problem is to introduce a roughness penalty into the objective function, in the manner of Rice and Silverman (1991) and Green and Silverman (1994). Consider the regression equation

numbered Display Equation

For matching two configurations one could minimize

numbered Display Equation

where λ is a smoothing parameter, s2( · ) is an objective function for measuring shape discrepancy and J( · ) ≥ 0 is a roughness penalty. For example, in two dimensions consider smooth shape matching of the complex k-vector configurations T to Y by minimizing

numbered Display Equation

where β is a scale, θ is a rotation and is a translation.

If we consider m-dimensional affine matching from T to Y (the k × m matrices of coordinates from two configurations), then we would minimize

numbered Display Equation

where B is an (m + 1) × m matrix.

One possible penalty function is the quadratic penalty J(E) = trace(EE), where Ω is the k × k integrated square derivative matrix corresponding to a smoothing spline, as suggested by Hastie and Kishon (19911991) for smoothed affine matching. The bending energy of Equation (12.16) is another possible choice, and the solution is the smoothing thin-plate spline of Section 12.3.3.

In the case of J(E) = trace(ETΩE) and s2(E) = trace(ETE) a very simple adaptation of the Procrustes algorithm provides a solution. In the affine least squares case the solution is simply given by the m eigenvectors of with smallest eigenvalues [ is the matrix defined after Equation (12.4)].

12.6 Diffeomorphic transformations

There are many other types of deformations that could be considered. A rather different approach using techniques of fluid mechanics was considered by Christensen et al. (1996). A smooth deformation is obtained by applying a vector field transformation to the starting figure T. Although large scale smoothness is guaranteed, large magnitude deformations on a local scale are not heavily penalized. The result is that dramatic diffeomorphic deformations can be achieved, for example from an image of a small wedge-shaped patch into a letter ‘C’. A diffeomorphism is a one to one and onto transformation, that is a diffeomorphic transformation. The class of diffeomorphic transformations is very appealing, as the transformation is invertible.

Younes (2010) gives a comprehensive account of diffeomorphic transformations in shape analysis, including using Euler–Lagrange equations and flow-based matching. Although the treatment is far more generally applied to curves and surfaces, the methodology is also appropriate for comparing point configurations as a special case. Grenander and Miller (2007) also provide a comprehensive treatment of the deformation approach to shape analysis.

Beg et al. (2005) introduce the large deformation diffeomorphic metric mapping (LDDMM) methodology, which is a method for computing diffeomorphisms, usually between medical images in the field of computational anatomy. In particular the estimated diffeorphic mapping ϕ1 from image I0 to image I1 is the end point of a flow of a time-dependent velocity field vt specified by the ordinary differential equation (ODE) . Here the time index t indicates a series of transformations, where ϕ0 is the identity transformation, and the end point ϕ1 is the transformation of interest. There are many such paths, and the LDDMM method involves obtaining the smoothest path according to some criterion. In particular the solution is obtained from a variational problem by minimizing over the space of velocity fields the expression

numbered Display Equation

where ||vt||V is a norm on a space of smooth velocity fields. Beg et al. (2005) derive the Euler–Lagrange equations and provide a gradient algorithm for numerical computation. Miller et al. (2002) make connections with Euler–Poincaré equations to describe geodesic motions on the space of diffeomorphism groups known as EPDiff (Mumford and Michor, 2013).

Miller et al. (2006) use the concept of geodesic shooting for computing the diffeomorphism. Geodesic shooting involves estimating an inital velocity field of the transformation, and then evolving the deformorphic transformation which is determined by the initial velocities. A measurement of error or goodness of fit is chosen to represent how well the diffeomorphism represents the transformation, and the initial velocities are then adjusted iteratively to minimize the goodness of fit measure. LDDMM is usually applied to medical images, although it can also be applied to landmark configurations (Joshi and Miller, 2000). Further related work is given by Younes et al. (2008b) and Bauer et al. (2014).

Arsigny et al. (2006) introduce an efficient computational framework for diffeomorphisms which defines an exponential mapping from the vector space of smooth stationary velocity fields to diffeomorphisms. The exponential exp (u) of a velocity field is given by the flow at a time of a stationary ODE. In this log-Euclidean framework the vector field is stationary, and one only needs the flow at a single time to compute the diffeomorphism. Hence the method is very efficient numerically. Vercauteren et al. (2009) describe the method of diffeomorphic demons, which is an efficient tool for image registation following from Arsigny et al. (2006). Ashburner (2007) gives an introduction to deformations in general, making the distinction clear between small and large deformation frameworks. Ashburner (2007)’s method DARTEL has similarities with the log-Euclidean diffeomorphism methods. Other techniques of interest include Glaunès et al. (2004) for landmark matching via large deformation diffeomorphisms on the sphere; Durrleman et al. (2008, 2009) for comparing brain surfaces based on currents; and Sommer et al. (2013) who consider LDDMM and a kernel bundle variational formulation, including deformations at various scales and sparsity. Some practical comparisons of various deformation methods for tensor-based morphometry are given by Villalon et al. (2011). In particular they compare surface constrained deformations, diffeomorphic demons and fluid based registration. A general survey of defomable medical image registration methods is given by Sotiras et al. (2013).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset