Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

12
Deformations for size and shape change

12.1 Deformations

12.1.1 Introduction

In biology and other disciplines one is frequently interested in describing the difference in shape or size-and-shape between two or more objects. A measure of distance, such as the Riemannian distance, gives us a numerical measure of size and shape comparison, but this global measure does not indicate locally where the objects differ and the manner of the difference. The biological goal in many studies is to depict or describe the morphological changes in a study.

First of all we consider the case when only two objects are available and we wish to describe the size and shape difference between them. Sometimes there will be a natural ordering of the two objects, for example a juvenile growing into an adult, and sometimes not, for example a male and a female of a species. In order to describe the difference in size and shape we compute the transformation of the space in which the first object lies into the space of the second object. The transformation will give us information about the local and global shape differences.

Definition 12.1 By global differences we mean large scale changes (with small amounts of bending per unit Procrustes distance), such as an overall affine transformation. Local differences are on a smaller scale (with larger amounts of bending per unit Procrustes distance), for example highlighting changes in a few nearby landmarks. Global differences are smooth changes between the figures, whereas local changes are the remainder of the components of a deformation and are less smooth.

The main method for describing deformations will be the thin-plate spline transformation. Following Bookstein (1989, 1991) we shall decompose a thin-plate spline deformation (defined below) into a global affine transformation and a set of local deformations which highlight changes at progressively smaller scales. This decomposition is in close analogy with a Fourier series decomposition, with the constant term in the Fourier series being the global parameter and the coefficients of the trigonometric terms being local parameters at successively smaller scales.

12.1.2 Definition and desirable properties

We consider the case where we have two objects in m ≥ 2 dimensions.

Definition 12.2 Consider two k landmark configuration matrices in , T = (t₁, …, t_k)^T and Y = (y₁, …, y_k)^T both k × m matrices, and we wish to deform T into Y, where . We use the notation that the m-vector t_j is written as t_j = (t_j[1], …, t_j[m])^T. A deformation is a mapping from to defined by the transformation

Here T is the source and Y is the target. The multivariate function Φ(t) should have certain desirable properties. In particular, we would want as many of the following properties to hold as possible for the deformation:

continuous;
smooth (which implies 1. of course);
bijective (one to one and onto);
not prone to gross distortions (e.g. not folding, which will be guaranteed if 3. holds);
equivariant under relative location, scale and rotation of the objects, that is the transformation is not affected by a common translation, rotation and scaling of the figures;
an interpolant, that is y_j = Φ(t_j) for j = 1, …, k.

If the interpolation property is not satisfied then we call the deformation a smoother. Note that the deformation is from the whole space to , rather than just from a set of landmarks to another or an outline to another.

Applications of deforming images will be considered in Section 17.4.1.

12.1.3 D’Arcy Thompson’s transformation grids

D’Arcy Thompson (1917) considered deformations from one species to another in order to explain size and shape differences. A regular square grid pattern was drawn on one object and the grid was deformed to lie on the second object, with corresponding biological parts located in the corresponding grid blocks. In Figure 2.1, Figure 12.1, Figure 12.2 and Figure 12.3 we see some examples. These grids are known as Cartesian transformation grids. The transformation grids enable a biologist to describe the size and shape change between the two species, albeit in a rather subjective way. D’Arcy Thompson’s figures were drawn by hand and were not always very accurate, for example in Figure 2.1 the correspondence in the lower and upper fins at (4.5, b) and (4.5, c) is not good (Bookstein, 1994b). There have been many attempts since 1917 to recreate these figures more objectively, as will be discussed in Section 12.4.1.

Image described by caption. — **Figure 12.1** An initial square grid placed on a human skull. Source: Thompson 1917. Reproduced with permission of Cambridge University Press.

12.2 Affine transformations

12.2.1 Exact match

The simplest possible size and shape change is that of an affine transformation, as seen in Figure 12.3. In this case the square grid placed on the first fish is deformed uniformly and affinely into a parallelogram grid on the second fish.

Definition 12.3 The affine transformation from a point to a point is:

(12.1)

where A is m × m, c is m × 1 and A is non-singular. A linear transformation is the affine transformation of Equation (12.1) but with c = 0.

Consider selecting a subset of m + 1 landmarks on each figure, and denote the subsets by T_sub and Y_sub, both (m + 1) × m matrices. Writing B = [c, A]^T then, since we match the m + 1 points exactly, the unknown parameters B are found from the solution to:

(12.2)

By selecting m + 1 landmarks we obtain a unique solution in general, because there are (m + 1)m unknown parameters in B and (m + 1)m constraints in Equation (12.2). Write X = [1_{m + 1}, T_sub] for the ‘design matrix’. Solving Equation (12.2) for B we have:

assuming (X^TX)^{− 1} exists, which will be true if X is of full rank m + 1. This solution is the same as the least squares estimator when we wish to estimate a transformation from k = m + 1 points, and here the fit is exact and the residuals are all zero.

12.2.2 Least squares matching: Two objects

For k > m + 1 the affine match is not exact in general and so we estimate the affine transformation by minimizing some criterion. Write X_D = [1_k, T] for the k × (m + 1) design matrix. A general regression model is given later in Equation (18.1). Taking h(T, B) = X_DB, where B is an (m + 1) × m matrix, gives an affine match of T to Y. We exclude configurations that do not have full rank, so rank(X_D) = m + 1 = rank(Y). Equation (18.1) becomes:

where A is an m × m matrix and c an m-vector for translation.

The general shape space (see Section 18.4) with affine invariance has been studied in detail by Ambartzumian (1982, 1990). Some distributional results for offset normal models were given by Goodall and Mardia (1993). Also Bhavnagri (1995b) and Sparr (1992) have considered the affine shape space which, if configurations of less than full rank are ignored, is the Grassmann manifold of m-planes in (e.g. Harris, 1992; Patrangenaru and Mardia, 2003). A suitable distance in the affine shape space between T and Y is given by:

(12.3)

The post-multiplication of Y by (Y^TY)^{− 1/2} prevents degenerate solutions and ensures that d_A(Y, T) = d_A(T, Y).

It is sometimes convenient to use vector notation: writing y = vec(Y), β = vec(B), and ε = vec(E), the linear model is

where ε is taken to be a zero mean random vector and X_d = I_m⊗X_D is a km × m(m + 1) block diagonal design matrix. The match is obtained by estimating β with , where

where vec^{− 1}(ε) = E is the reverse of vectorizing (see Definition 4.5).

Using least squares matching, the solution is given explicitly and is unique provided X^T_DX_D is of full rank (m + 1). The solution is obtained by differentiating

with respect to β (a discussion of matrix differentiation is given e.g. in Mardia et al. 1979, Appendix A.9). The resulting normal equations are:

and hence the solution is:

where (X^T_dX_d)^{− 1} is assumed to exist. The matrix representation of the solution is:

The second derivative matrix is 2X^T_dX_d which is strictly positive definite and hence the solution is a minimum.

The residual vector is given by:

where H_hat is the usual ‘hat’ matrix. Note that H_hatH_hat = H_hat and so H_hat is idempotent. Also, (I_km − H_hat) is idempotent. The minimized value of the objective function is

It is important to note that matching from T to Y is not the same as matching from Y to T in general. This property is generally true for most registration methods. One way to introduce symmetry into the problem is by obtaining an average object ‘half-way’ between T and Y, and then match T and Y separately into the average object.

The affine case where the residuals are zero (exact matches) was considered earlier in more detail in Section 12.2.1.

Example 12.1 Horgan et al. (1992) consider affine matches of two electrophoretic gels (gel A and gel B) described in Section 1.4.14. In Figure 1.25 we saw two configurations of k = 10 landmarks located on each gel which are taken from different proteins from malaria parasites. The landmarks are called invariant points in this application because they are common to all proteins in this study. The idea is to match the two configurations with an affine transformation. By matching the gels to each other or to a reference figure one can identify the malaria parasite by using the positions of the other (variant) spots in the image. In Figure 12.4 the fitted gel A image has been registered onto the gel B image. New grey levels x^N_ij have been calculated from the grey levels from the fitted gel A () and gel B (x^B_ij), using The figures have been thresholded to four levels for display purposes, indicating which spots were in gel A, gel B or both, and which region was not dark in either gel. The affine fitted gel A image matches quite well with the gel B image, because there are many dark grey common spots in both images. The parameters of the fit are:

and so there is a shift in location to the left and upwards ( − 36.08, 66.64) and approximately a 10% vertical shrinkage and a 3% horizontal shrinkage.

Mardia et al. (2012) have addressed a specific problem for matching gels under affine transformation in which each configuration contains a subset of points (‘markers’) whose labels correspond with high probability, with the remaining points having arbitrary labels (‘non-markers’).

Gordon (1995) provides a matching procedure based on different affine transformations in subsets of points formed by dividing regions of the plane, with an application to face matching.

12.2.3 Least squares matching: Multiple objects

Consider a random sample of configurations T₁, …, T_n from a population with mean μ. The design matrices are X_Di = [1_k, T_i], i = 1, …, n. If least squares is chosen for both the estimation of the mean and the point configuration matching, then we wish to minimize

where β_i = vec(B_i) and X_di = I_m⊗X_Di, i = 1, …, n. It is clear that, given μ,

Hence,

where H_i is the ‘hat’ matrix for X_di, that is

Therefore,

subject to the constraints μ^Tμ = I_m, where

(12.4)

Let be the k × k matrix such that . Therefore the solution is given by the m eigenvectors of with smallest eigenvalues. This is a well-known result and can be simply seen by repeatedly applying the standard result that the minimum value of

is given by the smallest eigenvalue of A_H [e.g. Mardia et al. 1979, Equation (A.9.11)].

Even with a general choice of s(E) if ϕ(E) = E^TE in Equation (18.2), then to obtain the new from the we have:

which can lead to a simple implementation.

The above generalized affine matching result was termed the linear spectral theorem by Goodall (1991, p. 335) and was also presented by Rohlf and Slice (1990) and Hastie and Kishon (19911991) and for planar shape analysis using complex notation by Kent (1994).

It is clear that affine matching and statistical inference can be tackled (with care) by existing regression and multivariate analysis procedures.

12.2.4 The triangle case: Bookstein’s hyperbolic shape space

In two dimensions the most general transformation between two triangles is an affine transformation and we study this case in more detail.

Consider two (different) triangles in a plane T = (t₁, t₂, t₃)^T and Y = (y₁, y₂, y₃)^T, both (3 × 2) matrices, and the ratio of baseline lengths is denoted as γ = ||y₂ − y₁||/||t₂ − t₁|| and Bookstein’s shape coordinates for each triangle are (θ_T, ϕ_T)^T and (θ_Y, ϕ_Y)^T. Let us exclude all collinear triplets of points (flat triangles) and all shape changes from a triangle with an apex above the baseline to a triangle with an apex below the baseline (we are excluding reflections). Consider the change in shape from triangle T to triangle Y. First of all the triangles are translated, rotated and rescaled to T* = (t*₁, t₂*, t*₃)^T and Y* = (y*₁, y₂*, y*₃)^T so that they have common baselines to . The transformation between the shapes of the two triangles is described by the affine transformation

(12.5)

where A is a 2 × 2 matrix, c is a 2 × 1 vector and

Solving the six equations in Equation (12.5) we find that

(12.6)

If we inscribe a circle of unit diameter in triangle T, this is linearly transformed into an ellipse in triangle Y, and the major and minor axes of the ellipse have lengths δ₁ ≥ δ₂ > 0 and are called the principal strains. Calculation of δ₁ and δ₂ is very simple using a singular value decomposition of A

where the columns u₁ and u₂ (both 2 × 1) of U are the eigenvectors of AA^T and the columns v₁ and v₂ of V are the eigenvectors of A^TA. Hence, δ₁ and δ₂ are the square roots of the eigenvalues of both AA^T and A^TA. The principal axes in triangle Y are u₁ and u₂, corresponding to axes v₁ and v₂ in T. Also,

and so we can see that the principal strains have lengths δ₁ ≥ δ₂ > 0 after the deformation where they were of unit length before.

The direction of the principal axes and the ratio δ₁/δ₂ are invariant under similarity transformations of the original coordinates and are suitable shape change descriptors (provided there is no reflection).

The quantity δ₁/δ₂ is the ratio of the largest stretch and the greatest compression. Taking the distance between two triangle shapes as log (δ₁/δ₂) we see that this shape change space for triangles is different from the shape space with the Riemannian metric. The distance log (δ₁/δ₂) is approximately δ/h, where δ is the Euclidean distance between the Bookstein coordinates and h is the V^B coordinate of the first (or second) triangle, see Figure 12.5.

Image described by surrounding text and caption. — **Figure 12.5** Bookstein’s hyperbolic shape space considers the distance between two shapes of triangles A and B as approximately δ/h for small shape differences.

In fact Bookstein’s shape space for triangles with distance log (δ₁/δ₂) is a space of constant negative curvature and is known in geometry as the hyperbolic space or the Poincaré half plane. Further details have been given by Small (1988), Bookstein (1991, Appendix A2), Small and Lewis (1995) and Kendall (1998). Kendall (1998) gives some diffusion results for triangles in this space. The hyperbolic space is somewhat restrictive due to its applicability only to triangles when we have 2D data. Comparisons of other configurations of more than three points could proceed by defining a triangulation of the points and working separately on each triangle (i.e. consider a union of hyperbolic spaces). Such triangulations and finite elements will be considered in Section 12.4.2. Small and Lewis (1995) describe the hyperbolic geometry for comparing the shapes of m + 1 points in m ≥ 2 dimensions, with tetrahedra in m = 3 dimensions being the most useful generalization.

12.3 Pairs of thin-plate splines

In practice when there are more than m + 1 landmarks in m dimensions an affine transformation in general will not fit exactly and so a non-affine shape deformation is required. Various approaches have been suggested, although not all the desirable properties listed in Section 12.1.2 are always satisfied.

We shall mainly concentrate on the important m = 2 dimensional case, with deformations given by the bivariate function

Bookstein (1989) has developed a highly successful approach for deformations using a pair of thin-plate splines for the functions Φ₁(t) and Φ₂(t). The thin-plate spline is a sensible choice since it minimizes the bending required to take the first form into the second, and it does not suffer from problems with very large bending towards the periphery.

12.3.1 Thin-plate splines

The thin-plate spline is a natural interpolating function for data in two dimensions and plays a similar rôle in m = 2 dimensions to the natural cubic spline for interpolation in the 1D case. The natural cubic spline in one dimension is the unique interpolant g(x) which minimizes the roughness penalty

subject to interpolation at the knots (landmarks). However, there are important differences in our application to shape analysis. The cubic spline need not be monotone and is not usually required to be, whereas the thin-plate spline is often desired to be bijective. A monotone cubic spline is analogous to a bijective thin-plate spline.

We shall see that the thin-plate spline is a natural interpolant in two dimensions because it minimizes the amount of bending in transforming between two configurations, which can also be considered a roughness penalty. For an introduction to natural cubic splines and thin-plate splines see Wahba (1990) and Green and Silverman (1994).

We concentrate on the important m = 2 dimensional case, the theory of which was developed by Duchon (1976) and Meinguet (1979). Consider the (2 × 1) landmarks t_j, j = 1, …, k, on the first figure mapped exactly into y_i, i = 1, …, k, on the second figure, that is there are 2k interpolation constraints,

(12.7)

and we write Φ(t_j) = (Φ₁(t_j), Φ₂(t_j))^T, j = 1, …, k, for the 2D deformation. Let

so that T and Y are both (k × 2) matrices.

Definition 12.4 A pair of thin-plate splines (PTPS) is given by the bivariate function

(12.8)

where t is (2 × 1), s(t) = (σ(t − t₁), …, σ(t − t_k))^T, (k × 1) and

(12.9)

The 2k + 6 parameters of the mapping are c(2 × 1), A(2 × 2) and W(k × 2). There are 2k interpolation constraints in Equation (12.7), and we introduce six more constraints in order for the bending energy in Equation (12.14) to be defined:

(12.10)

The pair of thin-plate splines which satisfy the constraints of Equation (12.10) are called natural thin-plate splines. Equation (12.7) and Equation (12.10) can be re-written in matrix form

(12.11)

where (S)_ij = σ(t_i − t_j) and 1_k is the k-vector of ones. The matrix

is non-singular provided S in non-singular. Hence,

say. Writing the partition of Γ^{− 1} as

where Γ¹¹ is k × k, it follows that

(12.12)

giving the parameter values for the mapping. If S^{− 1} exists, then we have:

(12.13)

where Q = [1_k, T], using for example Rao (1973, p. 39).

Using Equation (12.12) and Equation (12.13) we see that and are generalized least squares estimators, and

Mardia et al. (1991) gave the expressions for the case when S is singular.

Definition 12.5 The k × k matrix B_e is called the bending energy matrix where

(12.14)

There are three constraints on the bending energy matrix

and so the rank of the bending energy matrix is k − 3.

It can be proved that the transformation of Equation (12.8) minimizes the total bending energy of all possible interpolating functions mapping from T to Y, where the total bending energy is given by:

(12.15)

A simple proof is given by Kent and Mardia (1994a). The minimized total bending energy is given by:

(12.16)

For the deformation grid we want to have as little bending as possible to provide a simple interpretation, both locally near data points and further away from the data. Hence the minimum bending property of the PTPS is highly suitable for many applications. The thin-plate spline has also proved popular because of the simple analytical solution. The deformation can be generalized into m dimensions when a multiple of m thin-plate splines (MTPS) will be required, and this is studied in Section 12.5.

12.3.2 Transformation grids

Following from the original ideas of Thompson (1917) we can produce similar transformation grids, using a pair of thin-plate splines for the deformation from configuration matrices T to Y. A regular square grid is drawn over the first figure and at each point where two lines on the grid meet t_i the corresponding position in the second figure is calculated using a PTPS transformation y_i = Φ(t_i), i = 1, …, n_g, where n_g is the number of junctions or crossing points on the grid. The junction points are joined with lines in the same order as in the first figure, to give a deformed grid over the second figure. The PTPS can be used to produce a transformation grid, say from a regular square grid on the first figure to a deformed grid on the second figure. The resulting interpolant produces transformation grids that ‘bend’ as little as possible. We can think of each square in the deformation as being deformed into a quadrilateral (with four shape parameters). The PTPS minimizes the local variation of these small quadrilaterals with respect to their neighbours.

Example 12.2 Consider describing the square to kite transformation which was considered by Bookstein (1989) and Mardia and Goodall (1993). Given k = 4 points in m = 2 dimensions the matrices T and Y are given by:

We have here

where a = σ(√2) = 0.6931 and b = σ(2) = 2.7726. In this case, the bending energy matrix is:

It is found that

and so the PTPS is given by Φ(t) = (Φ₁(t), Φ₂(t))^T, where

(12.17)

Note that in Equation (12.17) there is no change in the t[1] direction. The affine part of the deformation is the identity transformation.

We consider Thompson-like grids for this example, displayed in Figure 12.6. A regular square grid is placed on the first figure and deformed into the curved grid on the kite figure. We see that the top and bottom most points are moved downwards with respect to the other two points. If the regular grid is drawn on the first figure at a different orientation, then the deformed grid does appear to be different, even though the transformation is the same. This effect is seen in Figure 12.6 where both figures have been rotated clockwise by 45° in the second row.

Example 12.3 In Figure 12.7a we see a square grid drawn on the estimate of mean shape for the Control group in the schizophrenia study of Section 1.4.5. Here there are n_g = 40 × 39 = 1560 junctions and there are k = 13 landmarks. In Figure 12.7b we see the schizophrenia mean shape estimate and the grid of new points obtained from the PTPS transformation. It is quite clear that there is a shape change in the centre of the brain, around landmarks 1, 6 and 13.

12.3.3 Thin-plate splines in R

Transformation grids can be drawn using the shapes package in R with the command tpsgrid. So, for example we can produce the grid of Figure 12.7 using the commands:

data(schizophrenia)
control<-procGPA(schizophrenia$x[,,1:14])$mshape
patient<-procGPA(schizophrenia$x[,,15:28])$mshape
patient<-procOPA(control,patient)$Bhat
               #align patient mean to control
tpsgrid(control,patient,opt=2,ngrid=40,mag=3)

Also, we can retain the scale by considering the transformation between the size-and-shape means, by using the option scale=FALSE in procGPA. Note that the options here are opt=2 which plots the grids on both figures (whereas opt=1 just plots the deformed grid on the second figure), ngrid=40 gives the dimension of the grid, and mag=3 exaggerates the deformation by a magnification factor of 3 here.

Example 12.4 Consider again the schizophrenia data from Section 1.4.5 but now the transformation from the patient mean shape to the control mean shape is displayed in Figure 12.8. It is clear that a feature of the grid is that a derivative is approximately zero at the point of major shape change. This feature is known as a crease, see Bookstein (2000, 2014, p. 444). Crease analysis is described in detail by Bookstein (2000) and the method involves finding a magnification, orientation and scaling for a more precise location of the crease. Creases have been hypothesized to be important stable features during evolution (Bookstein, 2002). In Figure 12.8(b) the crease is visually obvious, and a specific test for group differences at the crease is suggested (Bookstein, 2000).

Example 12.5 Consider describing the growth of the sooty mangabey skull from the dataset in Section 1.4.12, and in particular in relation to 12 landmarks in the midline section of the skull. Five juveniles are taken at increasing dental age (an approximate age of the animal based on the progress in growth of the teeth). A PTPS is used to deform a square grid from age stages 1 to 2, from age stages 1 to 3, from age stages 1 to 4 and from age stages 1 to 5 (Figure 12.9). The series of grids is a useful tool for describing perceived growth. In particular there is a clear extension to the lower left of the face as the age of the monkeys increase. In this case the monkeys are all different and can be assumed to be independent, within the class of sooty mangabeys. In the example the PTPS transformation is useful for describing size-and-shape change.

Definition 12.6 Transforming from one object to another is called warping, particularly when the transformation is from one image to another image.

Further properties

The (single) thin-plate spline can be viewed as a particular case of kriging – a technique for spatial prediction developed in geostatistics; for example, see Cressie (1993) and Kent and Mardia (1994a). Discussion of this link is found in Section 12.5.

The thin-plate spline consists of an affine and a non-affine part. Only the non-affine part contributes to the bending energy and hence two configurations which are affine transformations of each other have zero total bending energy. If k = 3 points are available, then there is only the affine part.

The thin-plate spline is so named because it solves the mechanical problem of determining the minimum energy shape of an infinitely thin stiff metal plate under certain constraints. The PTPS is usually bijective (one to one) for practical situations where the shape change is not far from affine, but it is not guaranteed to be bijective. There is no mathematical restriction to prevent folding, and if the shape change is highly non-affine, then folding may occur.

A further desirable property is that the PTPS is equivariant under location, rotation and scale (but not a general affine deformation ). In addition, as t becomes farther away from the first object the thin-plate spline deformation becomes approximately affine.

Various modifications could be made to the deformation. In particular, one may wish to consider the domain to be finite, D say, rather than . In this case the thin-plate spline can be modified and details can be found in Stone (1988) and Green and Silverman (1994). The finite domain interpolant has important differences from the infinite domain thin-plate spline. In particular, there are usually differences around the edges, with reduced variability in these regions.

Also, instead of second derivatives in the energy integral in Equation (12.15) one could consider rth (r > 2) derivatives, leading to a smoother interpolant (see Section 12.5.3). Such a spline will have 2r − 3 continuous derivatives everywhere but its 2r − 2th derivative has a singularity at each data point. Details of higher order splines can be found in Duchon (1976); Meinguet (1979); Green and Silverman (1994); Kent and Mardia (1994a); and Mardia et al. (1996e) for example. For further details of thin-plate splines refer to Wahba (1990); and Green and Silverman (1994); Kent and Mardia (1994a) and the references therein.

Smoothing splines

Rather than interpolating landmarks exactly one could consider a pair of smoothing thin-plate splines. The pair of smoothing thin-plate splines consists of the bivariate function Φ(t) which minimizes

where λ > 0 is a smoothing parameter, and J(Φ) is defined in Equation (12.15). The term λJ(Φ) can be thought of as a roughness penalty. Using Duchon (1976) (for a single thin-plate spline) we see that the solution has exactly the same form as in Equation (12.8) except that the unknown coefficients are obtained from the matrix equation:

(12.18)

Hence, dealing with smoothing thin-plate splines involves an almost identical algorithm as that required for interpolating splines, except that S is replaced by S + λI_k. The smoothing parameter controls how smooth the spline is and the value of λ must be pre-specified. If λ is very large, then the resulting spline resembles an affine function (which is completely smooth) and if λ is very small the smoothing spline is close to the interpolating spline. The smoothing spline is not equivariant under scaling of the configurations.

An alternative to pre-specifying λ is to use cross-validation to estimate λ. Consider the 2D observation y_j corresponding to t_j as a new observation by omitting it from the dataset. Denote the smoothing thin-plate spline from the rest of the data points as Φ^{− j}(t_j, λ). We can judge how well the smoother predicts the deleted observation by examining the magnitude of

The cross-validation score is an overall measure of the prediction squared error

and so an automatic choice for λ can be obtained by minimizing the cross-validation score CV(λ) over λ. The minimization is often not straightforward, and a simple grid search is often the best approach (Green and Silverman, 1994, p. 30).

12.3.4 Principal and partial warp decompositions

Bookstein (1989, 1991) introduced principal and partial warps to help explain the components of a thin-plate spline deformation. The principal and partial warps are useful for decomposing the thin-plate spline transformations into a series of large scale and small scale components.

Definition 12.7 Consider the PTPS transformation from to , which interpolates the k points T to Y (k × 2) matrices. An eigen-decomposition of the k × k bending energy matrix B_e of Equation (12.14) has non-zero eigenvalues λ₁ ≤ λ₂ ≤ … ≤ λ_{k − 3} with corresponding eigenvectors γ₁, γ₂, …, γ_{k − 3}. The eigenvectors γ₁, γ₂, …, γ_{k − 3} are called the principal warp eigenvectors and the eigenvalues are called the bending energies. The functions,

(12.19)

are the principal warps, where s(t) = (σ(t − t₁), …, σ(t − t_k))^T.

Here we have labelled the eigenvalues and eigenvectors in this order (with λ₁ as the smallest eigenvalue corresponding to the first principal warp) to follow Bookstein’s labelling of the order of the warps (Bookstein, 1996b). The principal warps do not depend on the second figure Y. The principal warps will be used to construct an orthogonal basis for re-expressing the thin-plate spline transformations. The principal warp deformations are univariate functions of 2D t, and so could be displayed as surfaces above the plane or as contour maps. Alternatively one could plot the transformation grids from t to y = t + (c₁P_j(t), c₂P_j(t))^T for each j, for particular values of c₁ and c₂. Note that the principal warp eigenvectors are orthonormal.

Definition 12.8 The partial warps are defined as the set of k − 3 bivariate functions R_j(t), j = 1, …, k − 3, where

(12.20)

The jth partial warp scores for Y (from T) are defined as:

(12.21)

and so there are two scores for each partial warp.

Bookstein (2015a) also considers deflated partial warp scores, each multiplied by the square root of λ₁/λ_j for 2D and multiplied by λ₁/λ_j for 3D. Since

we see that the non-affine part of the PTPS transformation can be decomposed into the sum of the partial warps. The jth partial warp corresponds largely to the movement of the landmarks which are the most highly weighted in the jth principal warp. The jth partial warp scores indicate the contribution of the jth principal warp to the deformation from the source T to the target Y, in each of the Cartesian axes.

The partial warps are bivariate functions and can be viewed in a plot as deformations of a square grid placed over T. In particular if t is a point in the plane of the original configuration, then we obtain the grid for the jth partial warp by deforming to y = t + R_j(t). Hence, interpretation of the partial warps can lead to an increased understanding of the deformation. It is also often helpful to display the affine transformation as a deformed grid, giving a complete decomposition of the thin-plate spline deformation.

Let the figure T be scaled to unit size, centred with coordinates (μ₁, ν₁), …, (μ_k, ν_k), and rotated so that ∑^k_{j = 1}μ_jν_j = 0. Let α = ∑μ²_j, β = ∑ν_j² with α + β = 1, and denote

(12.22)

(12.23)

Note that u₁ and u₂ are orthogonal to the block diagonal matrix diag(B_e, B_e) (and to T). The affine component is often called the zeroth-order warp. The affine scores (zeroth partial warp scores) are given by:

Note that

and u₁ and u₂ together form an orthonormal basis of tangent space.

The partial warps represent non-affine aspects of the deformation ordered according to the amount of bending that is required to move points in the first figure. In practice this results in the partial warps representing the deformation at various scales – the larger values of the eigenvalues (i.e. larger j) represent small scale local deformation and the smaller eigenvalues (smaller j) represent large scale global deformation. Hence, the first principal and partial warps will be associated with an overall large scale bending of the figure and the last principal and partial warps will usually be associated with the landmarks that are closest together, at the smallest scale.

An analogy is with a Fourier series representation of a function. An orthonormal trigonometric basis is chosen which represents features at a variety of successively finer scales, for example cos x, sin x, cos 2x, sin 2x, cos 3x, sin 3x, ….

We take the inverse bending energy matrix as the Moore–Penrose inverse, given in Definition 9.1:

(12.24)

A summary of the different types of warps is given in Table 12.1. The principal warps are detemined by one configuration only, the partial warps from a transformation between two configurations, and the relative warps from a sample of n configurations (described in Section 12.3.6).

Table 12.1 Types of warps and notation.

Type	Notation	Subscripts	Equation
Principal warp	P_j(t)	j = 1, …, k − 3	(12.19)
Partial warp	R_j(t)	j = 1, …, k − 3	(12.20)
Partial warp score	(p_j1, p_j2)	j = 1, …, k − 3	(12.21)
Relative warp	f_j	j = 1, …, 2k − 6	(12.25)
Relative warp score	a_ij	i = 1, …, n; j = 1, …, 2k − 6	(12.25)

Example 12.6 Bookstein (1991, Section 7.5.2) considers a non-symmetric T configuration (k = 5 points) deformed into a second figure (Figure 12.10a,b). It is found that the two eigenvectors of the bending energy matrix are:

with corresponding eigenvalues λ₁ = 0.296 and λ₂ = 0.568. Note that eigenvalues of Bookstein (1991) differ by a factor of 2 because he uses σ(||h||) = ||h||²log (||h||²) = 2||h||²log (||h||) for the covariance function. We can inspect the loadings of each eigenvector to see which coefficients are large, and hence interpret the effect. Alternatively we can plot the principal warps as surfaces and in Figure 12.10c,d we see the two principal warps. The first principal warp corresponds to a relatively large feature – the difference in landmarks 4 and 5 relative to 1, 2 and 3. The second (and last) principal warp reflects the relatively small feature differences in the displacements of the two nearest points, landmarks 4 and 5. In Figure 12.11 we see the deformation grids for the partial warps and the affine part. The first (larger scale change) involves landmarks 4 and 5 moving down relative to points 1, 2 and 3. The second partial warp is largely composed of the relative movement of landmarks 4 and 5 together (smaller scale change). Also in Figure 12.11 we see the affine component, which is a shear of the left region upwards and the right region downwards.

Example 12.7 Consider describing the shape change from an estimate of the mean size-and-shape female gorilla skull midline to an estimate of the mean size-and-shape of the male gorilla skull midline, for the data described in Section 1.4.8. There are k = 8 landmarks in the plane. The thin-plate spline deformation is given in Figure 12.12. Although interpretation is far from straightforward, a possible explanation is that the main size-and-shape difference is a greater protrusion in the male face compared with the female face – there is less size-and-shape difference at the back of the skull.

There are five principal warps from the pooled Procrustes mean, shown as surfaces in Figure 12.13. It is clear that the first few principal warps are smoother and spread over a larger scale than the less smooth, smaller scale, last few principal warps. The principal warps are also displayed as deformation grids in Figure 12.14. We can see the larger scale deformation of the first principal warp which includes the face (left half) and braincase (right half) moving downwards relative to the middle of the skull. The face is also stretched outwards. The second principal warp includes the relative shrinking in height of the braincase, and the bottom left three landmarks moving away from the rest of the skull. The principal warps are concerned with successively smaller scale deformations, and at the smaller scales the fourth principal warp involves the movement in ba and o and the final principal warp includes a contrast between left-most landmarks pr and na and the rest of the skull. The landmark labels were given in Figure 1.18. If using either of the individual male or female group means rather than the pooled mean, then the plots appear virtually identical to the eye.

The five partial warp deformation grids for the female mean shape to the male mean shape are shown in Figure 12.15 together with the affine grid (all magnified three times). We see that the face region is pulled up and outwards in partial warp 1, and the face is taller in height in partial warp 2. There is very little contribution in partial warp 3. In partial warp 4 landmarks ba and o become closer together and in partial warp 5 landmarks pr and na become closer together.

The affine and partial warp scores for the gorilla skull data are displayed in Figure 12.16. For each partial warp there are two scores (one for each Cartesian axis x, y). For each warp we plot the pairs of scores for the y-axis against the x-axis, labelling the males and females with ‘m’ and ‘f’, respectively. We can see that there are fairly good separations of the males and females in the affine scores and the partial warp scores, except for partial warp 3. The percentage of pooled variability explained by the affine component is 26.1%, and for the partial warps 1 to 5 the percentages explained

are 32.7, 17.8, 12.3, 6.5 and 4.4%, respectively. Note that the variance of the partial warps drops with warp number as in Bookstein (2015a). The shape variability is not spread evenly over the warps. So, the variability in the scores is not spherical – there is much more shape variability in the first few partial warp scores and the smaller scale warps summarize only a small fraction of the shape variability.

Note that the principal warps and partial warp scores can be obtained in R using procGPA with the option alpha=1. In particular, the output is in $principalwarps, $partialwarpscores and the percentage contributions are in $partialwarp.percent. For example, for the pooled gorilla data:

ans<-procGPA( abind(gorf.dat,gorm.dat), alpha=1)
> ans$partialwarps.percent
[1] 32.719875 17.787477 12.304224 6.506736 4.443423

The affine components, scores and percentage contributions are obtained using option affine=TRUE and are given in $pcar, $scores, $percent respectively. For the pooled gorilla data we have:

ans<-procGPA( abind(gorf.dat,gorm.dat), alpha=1, affine=TRUE)
> ans$percent
[1] 17.529125 8.565122

12.3.5 PCA with non-Euclidean metrics

Principal component analysis of a random sample of shapes can be carried out with respect to the bending energy matrix or its generalized inverse, emphasizing large or small scale variations.

Definition 12.9 Define the pseudo-metric space as the real p-vectors with pseudo-metric given by:

where x₁ and x₂ are p-vectors, and A⁻ is a generalized inverse (or inverse if it exists) of the positive semi-definite matrix A.

The Moore–Penrose inverse is a suitable choice of generalized inverse (see Definition 9.1). If A is the population covariance matrix of x₁ and x₂, then d_A(x₁, x₂) is the Mahalanobis distance. The norm of a vector x in the metric space is

We could carry out statistical inference in the metric space rather than in the usual Euclidean (A = I) space (Mardia 1977, 1995; Bookstein 1995, 1996b; Kent, 1995, personal communication). A simple way to proceed is to transform from to y (A⁻)^1/2x in Euclidean space. For example, consider PCA of n centred p-vectors x₁, …, x_n in the metric space. Transforming to y_i = (A⁻)^1/2x_i the PC loadings are the eigenvectors of

Denote the eigenvectors (p-vectors) of S_y as γ_yj, j = 1, …, p (assuming p ≤ n − 1), with corresponding eigenvalues λ_yj, j = 1, …, p. The PC scores for the jth PC on the ith individual are:

So, the (unnormalized) PC loadings on the original data are (A⁻)^1/2γ_yj which are the eigenvectors of (A⁻)^1/2S_x(A⁻)^1/2, where , using standard linear algebra, see Mardia et al. (1979, Appendix). The first few PCs in the metric space (with loadings given by the eigenvectors of (A⁻)^1/2S_x(A⁻)^1/2) can be useful for interpretation, emphasizing a different aspect of the sample variability than the usual PCA in Euclidean space. If our analysis is carried out in the pseudo-metric space, then we say the our analysis has been carried out with respect to A.

For example, in canonical variate analysis in multivariate analysis observations from several groups are available. An eigendecomposition of the between group covariance matrix S_between is carried out with respect to S_within, the within group covariance matrix (e.g. see Mardia et al. 1979, p. 338). Another form of weighted PCA was carried out by Polly et al. (2013) who consider PCA weighted by phylogenetic distance.

Potentially very useful metrics for shape analysis are the bending energy and inverse bending energy metrics in the tangent space. All metrics here depend on quadratic forms, and the procedures use Mahalanobis distances. A very different approach could be obtained by using, say, the L₁ norm

instead of the Euclidean L₂ norm

12.3.6 Relative warps

If a random sample of shapes is available, then one may wish to examine the structure of the within group variability in the tangent space to shape space. This problem was approached in Section 7.7 using PCA with respect to the Euclidean metric, but an alternative is the method of relative warps. Relative warps are PCs with respect to the bending energy or inverse bending energy metrics in the shape tangent space. Bookstein (2015a) also considered deflated relative instrinsic warps which involves deflating partial warp scores.

Consider a random sample of n shapes represented by Procrustes tangent coordinates v₁, …, v_n (each is a 2k − 2-vector), where the pole μ is chosen to be an average pre-shape such as from the full Procrustes mean, see Section 8.3. The sample covariance matrix in the tangent plane is denoted by S_v and the sample covariance matrix of the centred tangent coordinates x_i = (I₂⊗H^T)v_i, i = 1, …, n is denoted by S_c (2k × 2k). In our examples we have used the covariance matrix of the Procrustes fit coordinates of Section 7.7.4. The bending energy matrix is calculated for the average shape B_e and then the tensor product is taken to give B₂ = I₂⊗B_e, which is a 2k × 2k matrix of rank 2k − 6. We write B⁻₂ for a generalized inverse of B₂ (e.g. the Moore–Penrose generalized inverse of Definition 9.1).

We consider PCA in the tangent space with respect to a power of the bending energy matrix, in particular with respect to B^α_e.

Definition 12.10 Let the non-zero eigenvalues of (B⁻₂)^α/2S_c(B⁻₂)^α/2 be l₁, …, l_{2k − 6} with corresponding eigenvectors f₁, …, f_{2k − 6} and

with λ₁, …, λ_{2k − 6} the eigenvalues of B₂ with corresponding eigenvectors γ₁, …, γ_{2k − 6}. The eigenvectors f₁, …, f_{2k − 6} are called the relative warps. The relative warp scores are:

(12.25)

Important point: The relative warps and the relative warp scores are useful tools for describing the non-affine shape variation in a dataset. In particular the effect of the jth relative warp can be viewed by plotting

for various values of c, where

The procedure for PCA with respect to the bending energy requires α = +1 and emphasizes large scale variability. Principal component analysis with respect to the inverse bending energy requires α = −1 and emphasizes small scale variability. If α = 0, then we take B⁰₂ = I_2k as the 2k × 2k identity matrix and the procedure is exactly the same as PCA of the Procrustes tangent coordinates; see Section 7.7. Bookstein (1996b) has called the α = 0 case PCA with respect to the Procrustes metric. Bookstein (2014) argues that isotropic covariance structure is unrealistic for most applications, and that covariance matrices based on bending energy are often preferable.

Affine variation can also be considered. Let the Procrustes average shape be scaled to unit size and rotated with the principal axes horizontal and vertical, and let u₁ and u₂ be constructed from the mean shape as in Equation (12.22) and Equation (12.23). The vectors u₁ and u₂ are orthogonal to the bending energy matrix. Write U = (u₁, u₂) for the 2k × 2 matrix. In order to view the affine contribution to variability we plot

for various values of c.

Example 12.8 We consider a relative warp analysis of the male and female gorilla data, continuing from Example 12.7. We saw that the affine component explained 26.1% of the total pooled variability. We calculate B_e using the overall full Procrustes mean from the pooled data. Carrying out a relative warp analysis with respect to B_e (α = 1) we have the ratio of eigenvalues of the first two relative warps to the sum of the eigenvalues for all the relative warps as 63.4 and 19.6%, respectively. A plot of the first two relative warp scores is given in Figure 12.17a, and there is good separation between the males and females. The effect of the first two relative warps is demonstrated in Figure 12.18 by placing a grid on the pooled mean shape and deforming to a shape along each of the relative warps. We see that the relative warps match quite closely to the first two principal warps (seen in Figure 12.13), emphasizing the large scale variability in the data.

Carrying out a relative warp analysis with respect to B⁻_e (α = −1) we have the ratio of eigenvalues of the first two relative warps to the sum of the eigenvalues for all the relative warps as 34.8, and 17.4%, respectively. A plot of the first two relative warp scores is given in Figure 12.17b, and again there is good separation between the males and females. The effect of the first two relative warps is demonstrated in Figure 12.19 and we see again that the first relative warp matches quite closely to the first principal warp, although some small scale movement of landmarks na and pr and movement of landmarks ba and o is emphasized. The second relative warp has some similarities with the second principal warp although there is a different emphasis in the braincase region (l is pushed farther leftwards).

Finally we carry out a relative warp analysis with respect to the identity matrix (α = 0), which is the same as PCA of the full Procrustes tangent coordinates of the pooled data. We have the percentages of shape variability explained by the first two relative warps (PCs) as 45.9 and 16.9%, respectively. A plot of the first two relative warp scores is given in Figure 12.17c, and again there is good separation between the males and females. The effect of the first two relative warps is demonstrated in Figure 12.20 and we see again that the relative warps match quite closely to the first and second principal warps.

In R we can carry out relative warp analysis using the command procGPA with option alpha=1 for relative warp scores with respect to bending energy, with option alpha=-1 with respect to inverse bending energy and with option alpha=0 for the Procrustes tangent space PCA:

s1<-procGPA( apes$x[,,1:59] , alpha=1)$scores
s2<-procGPA( apes$x[,,1:59] , alpha=-1)$scores
s3<-procGPA( apes$x[,,1:59] , alpha=0)$scores
par(mfrow=c(1,3))
#relative warps with respect to Be
plot(s1[,1],s1[,2],type="n",xlab="RW1",ylab="RW2")
text(s1[,1],s1[,2],c(rep("f",times=30),rep("m",times=29)))
title(sub="(a)")
#relative warps with respect to inverse Be
plot(s2[,1],s2[,2],type="n",xlab="RW1",ylab="RW2")
text(s2[,1],s2[,2],c(rep("f",times=30),rep("m",times=29)))
title(sub="(b)")
#relative warps with respect to inverse Be
plot(-s3[,1],-s3[,2],type="n",xlab="RW1",ylab="RW2")
text(-s3[,1],-s3[,2],c(rep("f",times=30),rep("m",times=29)))
title(sub="(c)")

Important point: We may ask: which of the choices of α leads to the most informative analysis? The answer depends on the particular application. If large scale effects are the most important, then α = 1 would be a good choice. If small scale effects are the most important, then α = −1 is a good choice. Otherwise, α = 0 would be an appropriate choice.

In the example of the gorilla data, large scale variation seems more important so it could be argued that α = 1 and α = 0 are more useful than α = −1.

12.4 Alternative approaches and history

12.4.1 Early transformation grids

Similar grids to the Cartesian transformation grids of D’Arcy Thompson were popular with Renaissance artists in the 16th century. For example, grids were drawn on a figure such as a skull and then deformed into a new skull by painting corresponding parts in the appropriate part of the deformed grid. A famous example is in the painting ‘The Ambassadors’ by Hans Holbein the Younger (1533, National Gallery, London). Such a deformation is known as ‘anamorphosis’, and in ‘The Ambassadors’ a distorted skull lies diagonally across the bottom of the picture. Other examples included the drawing of grids on different objects to compare them and particularly good examples were given by Dürer (1528) (Figure 12.21) for comparing human form (Bookstein, 1996b).

Medawar (1944) approached the problem mathematically after D’Arcy Thompson’s original ideas. Using the motivation of modelling the shape of humans, Medawar (1944) modelled the dataset known as ‘Stratz’s plane figures’ which have been commented on by Jackson (1915) (Figure 12.22). The data are the frontal elevations of the male human being, at six stages through life. Medawar used quadratic functions for the deformation function Φ₂(t) which only depends on height, and coefficients dependent on the age of the individual. The deformation function Φ₁(t) is not modelled explicitly. Here t = (t[1], t[2])^T is again a point on the first figure.

Medawar (1945) also considered explicit analytic modelling of both Φ₁(t) and Φ₂(t) in transforming an outline of a distorted spleen culture.

Sneath (1967) fitted cubic functions for Φ₁(t) and Φ₂(t) after initially centring, rotating and rescaling the two figures by a procedure which is identical to ordinary Procrustes analysis (see Section 7.2). One of the problems with this technique is that it allows for considerable bending away from the centre of the forms and results in gross distortions of the grid towards the periphery. See Figure 12.23 for an example, where undesirable folding of the grid can be seen.

Cheverud and Richtsmeier (1986) also obtain transformation grids but use the finite element scaling method. Their work is in three dimensions but includes sections of grids in two dimensions. Again the bending of the grids is very large in localized regions, see Figure 12.24 for an example.

12.4.2 Finite element analysis

If two figures are available (e.g. the average shape in each of two groups), then another possible method of shape change description is to consider a finite element description of the size and shape difference. Each figure is represented by a collection of elements or building blocks, for example triangles and quadrilaterals in two dimensions, or tetrahedrons and cuboids in three dimensions. A separate transformation is considered in each element and the approach usually leads to interpolants which are continuous but not smooth across element boundaries. The essence of the finite element approach is that a shape difference can be described in terms of the directions and magnitudes of the principal strains (see Section 12.2.4 for triangles in two dimensions) in the transformation of elements in one form to another. Homogeneous finite element methods (e.g. Bookstein, 1978; Moss et al. 1987) involve the assumption that shape differences (strains) are uniformly distributed throughout each element. Elements may contain different numbers of landmarks but the simplest possible are triangles in 2D figures with apices as homologous landmarks between two forms. In this case shape differences are necessarily affine between two triangles, as described in Section 12.2.4. Other types of finite element methods for shape analysis use non-homogeneous finite elements. Note that with all finite element analyses interpretation of shape differences will depend on element design (Zienkiewicz, 1971). Finite element scaling analysis (e.g. Lew and Lewis, 1977) is non-homogeneous and has been applied in studies of craniofacial growth and sexual dimorphism in three dimensions (Cheverud et al. 1983; Richstmeier, 1986; Cheverud and Richtsmeier, 1986; Richstmeier, 1989). An alternative is to use the design of elements that arise from a Delaunay triangulation (O’Higgins and Dryden, 1993; Kent and Mardia, 1994b; Okabe et al. 2000).

Recent work has provided a bridge between geometric morphometrics and finite element analysis, via the use of large scale relative warps (Bookstein, 2013a).

Example 12.9 Consider the size-and-shape difference in the midline between female and male gorilla crania, described in Section 1.4.8. First of all the average female and the average male are obtained. We saw in Section 9.4 that the mean shape difference is statistically significant.

In Figure 12.25 we see the average female drawn with finite elements indicating how the triangles change size and shape into the average male form. The crosses on the diagram are the principal strains (see Section 12.2.4) indicating how the unit length axes of a circle change into the major and minor axes of the deformed circle (ellipse).

There is not much shape difference in the back of the skull (b, ba, l) with smaller sum of strains here. However, there is more of a change in the face region (pr, na, st, n, ba) where the front of the face is more protruded in the male than in the female. The PTPS deformation for the same example is given in Figure 12.12.

12.4.3 Biorthogonal grids

Biorthogonal grids were introduced by Bookstein (1978) and give numerical information as to the directions and magnitudes of shape change. These grids give a map of infinitesimal strains at each point in the interior of the first figure, as it is deformed into the second. The biorthogonal grid shows the maximum and minimum amounts of the size-and-shape difference and can be viewed as the first derivative of the interpolation. The choice of interpolant will be crucial to our perception of shape change. At each point in the interior of the figure there are two differentials which are perpendicular before and after the transformation (principal strains). The curves following the directions of the differentials form a grid whose intersections are at 90° in both figures, and these grids are called biorthogonal grids. Applications of the biorthogonal grid have been made to a wide variety of situations (e.g. Bookstein, 1978, 1986), including describing the shape difference in the faces of children with and without fetal alcohol syndrome (Bookstein and Sampson, 1990). Sampson et al. (1991) have also used the biorthogonal grid but in a different setting in environmental statistics. They use the biorthogonal grid as an explanatory tool in the description of spatial covariance structures and their routine for displaying the grids uses the thickness of the lines to indicate the strength of the principal strains.

12.5 Kriging

12.5.1 Universal kriging

To obtain deformations we could consider kriging, a commonly used method of prediction in spatial statistics. We initially consider prediction from a univariate random process at an m-dimensional site t, given k univariate observations taken at sites . Kriging involves the construction of the unbiased linear predictor which minimizes the mean square prediction error

Let g(t) = (g₁(t), …, g_p(t))^T be a vector of known functions. Consider the general linear model

where β = (β₁, …, β_p)^T and ε(t) is a random field. If ε(t) is a zero mean stationary random field, with positive definite covariance function

with σ(0) < ∞, then the prediction is called universal kriging. If β^Tg(t) = μ is constant, then the prediction is called ordinary kriging. We consider universal kriging and more complete details can be found, for example, in Cressie (1993, p. 151). We follow the treatment of Mardia and Marshall (1984) and Mardia and Little (1994).

The term β^Tg(t) is called the drift of the underlying trend. Writing Y_site = (Y(t₁), …, Y(t_k))^T as the random vector of the process at the k sites and as a realization of Y_site, then the regression predictor is the conditional mean of Y(t) given Y_site, namely

(12.26)

(12.27)

where (S)_ij = σ(t_i − t_j), s(t) = (σ(t − t₁), …, σ(t − t_k))^T and the jth row of D is g(t_j)^T. Under the assumption of a Gaussian random field Equation (12.26) is well known (e.g. Mardia et al. 1979, p. 63). The universal kriging predictor (Mardia and Marshall, 1984) is equivalent to replacing β in Equation (12.27) with the generalized least squares predictor

Definition 12.11 The universal kriging predictor at site t is:

where

There is an alternative representation where the predictor is regarded as a (non-linear) function of t, that is

where

(12.28)

In matrix form we can write Equation (12.28) as:

(12.29)

and these equations are known as the dual-kriging equations. When written in this alternative form it is quite straightforward to see the link between splines and kriging predictors. We make the full link in the discussion of generalized kriging in Section 12.5.3. There is also a close connection with the radial basis functions (Arad et al. 1994; Mardia et al. 1996e). Radial basis functions in the numerical analysis literature (Powell, 1987) are approriate to use as isotropic covariance functions.

12.5.2 Deformations

In our applications we are interested in deformations from . Hence we are interested in predicting the multivariate process Y(t) = (Y(t)[1], …, Y(t)[m])^T using a collection of m independent kriging predictors. So, if the k observations in the m dimensions at the sites T = (t₁, …, t_k)^T are written as a k × m matrix Y_obs, then it follows that the universal kriging predictors can be written as:

(12.30)

where C is m × p and W is k × m, satisfying

(12.31)

Therefore, kriging can be used for deformations for a given positive definite covariance function σ(t). The same deformations using Equation (12.30) are given in the generalized kriging framework which follows, where the class of functions for σ(t) is widened. The kriging deformations were first introduced in Mardia et al. (1991).

12.5.3 Intrinsic kriging

In Section 12.5.1 the positive definite covariance function must exist. However, the technique of intrinsic or generalized kriging involves prediction from the model Y(t) = β^Tg(t) + ε(t), where ε(t) is an intrinsic random field, which we now define. Intrinsic processes were first developed by Matheron (1973).

Definition 12.12 Let G be a space of known functions. Given sites t₁, …, t_k, we call

an increment with respect to G if

An intrinsic random field is a stochastic process Y(t) which has, at increments, ∑a_iY(t_i) distributed with zero mean and variance

The function σ*(t_i, t_j) is called a conditional positive definite (c.p.d.) covariance function with respect to G.

In time series analysis taking differences annihilates a mean trend. Likewise taking increments annihilates trends or drifts expressed as functions in G. So an intrinsic random field corresponds to an equivalence class of stochastic processes which will have different drifts expressed as functions in G.

The linear predictor that minimizes mean square prediction at site t is called the intrinsic or generalized kriging predictor, and is also given by Equation (12.30), but with σ(t) a c.p.d. covariance function and a basis for G given by g(t) = (g₁(t), …, g_p(t))^T. Technical details can be found in Mardia et al. (1996e).

Consider the important class of intrinsic random processes with c.p.d. covariance function σ*(t, t + h) = σ_α(h) indexed by α > 0,

(12.32)

for . These c.p.d. covariance functions are of interest because they are self-similar, that is σ_α(ch) ≡ c^2ασ_α(h) for c > 0 where ‘≡’ means equal up to an even polynomial in h of degree 2[α], where [ · ] is the ‘integer part’ function. Thus, σ_α(h) and σ_α(ch) yield the same predictions. Let the space of functions G_r be the space of polynomials in h of degree ≤ r. It is well known that σ_α(h) defines a c.p.d. function with respect to G_r provided r ≥ [α] (e.g. Kent and Mardia, 1994a). The thin-plate spline for is exactly the same as the intrinsic kriging predictor using the self-similar random field with α = 1, σ(0) = 0 and c.p.d. covariance function σ₁(h) taken with respect to G₁, the linear functions. Full details are given in Kent and Mardia (1994a). For we will use α = 1/2 and σ_1/2(h) = −||h|| taken with respect to G₁.

Bookstein (2015a) explores self-similarity in the analysis of morphometric data, including an application to the rat growth data of Section 1.4.11.

The link between kriging and splines has been noted by many authors and discussed in some detail initially by Kimeldorf and Wahba (1971) (e.g. see also Matheron, 1981, p. 180, Cressie 1993; Kent and Mardia 1994a). However, the aims of the two approaches are quite different: for kriging the aim is to provide a linear predictor of Y(t) using the available observations y₁, …, y_k, whereas for splines the aim is to predict Y(t) with a smooth function of the site location t.

Comparing Equation (12.31) and Equation (12.18) we see that the intrinsic kriging predictor is the same as a smoothing spline with a c.p.d. covariance function equal to σ(t), with smoothing parameter

If λ = 0, then the kriging predictor is an interpolant [i.e. ] and this corresponds to an interpolating spline.

Example 12.10 Consider describing the main aspect of shape variability in the macaque data of Section 1.4.3, with k = 24 landmarks. We consider a kriging transformation from the full Procrustes mean male macaque to an icon which is three standard deviations away from the mean along the first PC. The transformation will demonstrate which parts of the skull are varying the most. The data are m = 3 dimensional and the c.p.d. covariance function is taken as σ_1/2(h) = −||h||. A regular cuboid grid is evaluated on the mean male and then deformed using the kriging transformation of the mean male to the icon. In Figure 12.26 we see four different views of the grids on the deformed male at three standard deviations along the first PC. The 3D representation is best viewed with an interactive spinning plot. However, we see from the static plots that the main shape variability in this PC is in the top of the braincase moving relative to the bottom of the braincase.

12.5.4 Kriging with derivative constraints

Bookstein and Green (1993) introduced the term ‘edgel’ to mean the first derivative information at a landmark. They extended the interpolating thin-plate spline for 2D deformations to include derivative constraints. As well as matching landmarks from one figure to the second, derivative information such as the tangent directions to the outline at a landmark were included. This approach allows higher order information to be included in the deformation. Mardia and Little (1994) and Mardia et al. (1996e) consider kriging with derivative constraints in a more general framework. The method can be extended to kriging with up to pth-order derivative constraints, provided the 2pth-order derivatives of the c.p.d. covariance function exist. Details are given in Mardia et al. (1996e). In particular, one may be interested in a deformation where there are landmark, tangent direction and curvature constraints. Note that away from the objects some undesirable bending or folding can occur. Mardia et al. (2004) have given a model-based approach for an edgel at a triangle shape whereas Mardia (2013a) has given a characterization for two edgels at two landmarks in three dimensions.

12.5.5 Smoothed matching

In some applications the data may be recorded imprecisely, and there may be quite a large amount of noise. For example, Hastie and Kishon (19911991) considered affine matching of handwritten signatures where there was a great deal of added measurement noise. It can be desirable to impose some smoothing into the analysis, in order to obtain smooth estimates of shape and shape variability.

A suitable approach to the problem is to introduce a roughness penalty into the objective function, in the manner of Rice and Silverman (1991) and Green and Silverman (1994). Consider the regression equation

For matching two configurations one could minimize

where λ is a smoothing parameter, s²( · ) is an objective function for measuring shape discrepancy and J( · ) ≥ 0 is a roughness penalty. For example, in two dimensions consider smooth shape matching of the complex k-vector configurations T to Y by minimizing

where β is a scale, θ is a rotation and is a translation.

If we consider m-dimensional affine matching from T to Y (the k × m matrices of coordinates from two configurations), then we would minimize

where B is an (m + 1) × m matrix.

One possible penalty function is the quadratic penalty J(E) = trace(E*ΩE), where Ω is the k × k integrated square derivative matrix corresponding to a smoothing spline, as suggested by Hastie and Kishon (19911991) for smoothed affine matching. The bending energy of Equation (12.16) is another possible choice, and the solution is the smoothing thin-plate spline of Section 12.3.3.

In the case of J(E) = trace(E^TΩE) and s²(E) = trace(E^TE) a very simple adaptation of the Procrustes algorithm provides a solution. In the affine least squares case the solution is simply given by the m eigenvectors of with smallest eigenvalues [ is the matrix defined after Equation (12.4)].

12.6 Diffeomorphic transformations

There are many other types of deformations that could be considered. A rather different approach using techniques of fluid mechanics was considered by Christensen et al. (1996). A smooth deformation is obtained by applying a vector field transformation to the starting figure T. Although large scale smoothness is guaranteed, large magnitude deformations on a local scale are not heavily penalized. The result is that dramatic diffeomorphic deformations can be achieved, for example from an image of a small wedge-shaped patch into a letter ‘C’. A diffeomorphism is a one to one and onto transformation, that is a diffeomorphic transformation. The class of diffeomorphic transformations is very appealing, as the transformation is invertible.

Younes (2010) gives a comprehensive account of diffeomorphic transformations in shape analysis, including using Euler–Lagrange equations and flow-based matching. Although the treatment is far more generally applied to curves and surfaces, the methodology is also appropriate for comparing point configurations as a special case. Grenander and Miller (2007) also provide a comprehensive treatment of the deformation approach to shape analysis.

Beg et al. (2005) introduce the large deformation diffeomorphic metric mapping (LDDMM) methodology, which is a method for computing diffeomorphisms, usually between medical images in the field of computational anatomy. In particular the estimated diffeorphic mapping ϕ₁ from image I₀ to image I₁ is the end point of a flow of a time-dependent velocity field v_t specified by the ordinary differential equation (ODE) . Here the time index t indicates a series of transformations, where ϕ₀ is the identity transformation, and the end point ϕ₁ is the transformation of interest. There are many such paths, and the LDDMM method involves obtaining the smoothest path according to some criterion. In particular the solution is obtained from a variational problem by minimizing over the space of velocity fields the expression

where ||v_t||_V is a norm on a space of smooth velocity fields. Beg et al. (2005) derive the Euler–Lagrange equations and provide a gradient algorithm for numerical computation. Miller et al. (2002) make connections with Euler–Poincaré equations to describe geodesic motions on the space of diffeomorphism groups known as EPDiff (Mumford and Michor, 2013).

Miller et al. (2006) use the concept of geodesic shooting for computing the diffeomorphism. Geodesic shooting involves estimating an inital velocity field of the transformation, and then evolving the deformorphic transformation which is determined by the initial velocities. A measurement of error or goodness of fit is chosen to represent how well the diffeomorphism represents the transformation, and the initial velocities are then adjusted iteratively to minimize the goodness of fit measure. LDDMM is usually applied to medical images, although it can also be applied to landmark configurations (Joshi and Miller, 2000). Further related work is given by Younes et al. (2008b) and Bauer et al. (2014).

Arsigny et al. (2006) introduce an efficient computational framework for diffeomorphisms which defines an exponential mapping from the vector space of smooth stationary velocity fields to diffeomorphisms. The exponential exp (u) of a velocity field is given by the flow at a time of a stationary ODE. In this log-Euclidean framework the vector field is stationary, and one only needs the flow at a single time to compute the diffeomorphism. Hence the method is very efficient numerically. Vercauteren et al. (2009) describe the method of diffeomorphic demons, which is an efficient tool for image registation following from Arsigny et al. (2006). Ashburner (2007) gives an introduction to deformations in general, making the distinction clear between small and large deformation frameworks. Ashburner (2007)’s method DARTEL has similarities with the log-Euclidean diffeomorphism methods. Other techniques of interest include Glaunès et al. (2004) for landmark matching via large deformation diffeomorphisms on the sphere; Durrleman et al. (2008, 2009) for comparing brain surfaces based on currents; and Sommer et al. (2013) who consider LDDMM and a kernel bundle variational formulation, including deformations at various scales and sparsity. Some practical comparisons of various deformation methods for tensor-based morphometry are given by Villalon et al. (2011). In particular they compare surface constrained deformations, diffeomorphic demons and fluid based registration. A general survey of defomable medical image registration methods is given by Sotiras et al. (2013).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 12 Deformations for size and shape change

Create new playlist

Sign In

Sign Up