Given a variety of approximation methods in the analysis of nonlinear longitudinal data, the fixed effects and the covariance parameters in GLMMs can be derived either from the linearization approaches or from the integral approximation methods. In displaying results of various nonlinear longitudinal models, it is often not sufficient to produce meaningful interpretations just by presenting parameter estimates and the corresponding standard errors. For example, in the presence of more than two nominal response outcomes, the regression coefficients of covariates do not necessarily bear any relationship to changes in the multinomial response (
Greene, 2003;
Liao, 1994; Liu et al., 1995). Because a set of probabilities for competing outcomes always sum up to unity, the simultaneous structural variation in the probability distribution of more than two alternatives is indeed beyond what a set of regression coefficients can capture. Therefore, in the application of GLMMs, the researcher often needs to compute nonlinear predictions on repeated measurements of the response, given selected values of the covariates, the estimated regression coefficients, and the random effect approximates.
In this section, I describe three approaches for performing nonlinear predictions in the application of GLMMs: the best linear unbiased prediction, or BLUP (
Kuk, 1995;
McGilchrist, 1994), the empirical Bayes estimate (
SAS, 2012, Chapter 64), and the retransformation method (
Liu and Engel, 2012; McCulloch et al., 2008).
8.4.1. Best linear unbiased prediction based on linearization
The BLUP applied in the analysis of nonlinear longitudinal data is the extension of the BLUP approximation in linear mixed models (described in
Chapter 4). These predicting approaches essentially modify the EM algorithm on the conditional expectation of the random effects, with the data replaced by the predictors based on the mode of the corresponding conditional distribution.
The BLUP procedure in GLMMs is often based on linearized normal data of the nonlinear response with
y˜ replacing
y. Given Equation
(8.39)
V(θ˜)=ZGZ'+Δ˜−1A1/2R(α˜)A1/2Δ˜−1,
the random error
ɛ˜, conditionally on
b, is assumed to be normally distributed. The maximum log pseudo likelihood for
y˜ can then be written as
l(θ˜,y)=12log∣∣V(θ˜)∣∣−12(y˜−X′β)′V(θ˜)−1(y˜−X′β)−N˜2log(2π),
(8.61)
where
N˜ is the sum of frequencies used in the analysis.
Likewise, the corresponding restricted log pseudo-likelihood function is
lR(θ˜,y)=12log∣∣V(θ˜)∣∣−12(y˜−X′β)′V(θ˜)−1(y˜−X′β) −12log∣∣∣X′V(θ˜)−1X∣∣∣−N˜−M˜2log(2π),
(8.62)
where
M˜ is the rank of
X (if
X has full rank,
M˜=M).
According to the standard estimating procedure for linear mixed models, the researcher can estimate the fixed parameters and predict the random effects by using the following two equations, respectively:
βˆ=[X′V(θ˜ˆ)−1X]−1X′V(θ˜ˆ)−1y˜,
(8.63)
bˆ=GˆZ′V(θ˜ˆ)−1(y˜−X′βˆ).
(8.64)
Equations
(8.63) and
(8.64) display the similarities of the BLUP used for the linearized pseudo data and the BLUP described in
Chapter 4. The predictor
bˆ in Equation
(8.64) is the estimated BLUP in linearization models. The final predictor can be obtained by iterations given a standard procedure of convergence. The above two equations are based on the condition that
φ=1; if the scale parameter
φ≠1, the equations can be easily adjusted by applying some standard approaches (for details, see
SAS, 2012, Chapter 38).
According to the rationale of the BLUP estimator for nonlinear predictions, the procedure yields the best and unbiased predictors on the transformed linear function.
By retransforming a transformed linear function to predict the nonlinear response variable, the predicted outcomes are regarded to be unbiased. Given the specification of the pseudo-error term
ɛ˜ in the linearization models, however, nonlinear predictions of the response must include retransformation of this error term in the retransformation process. If the link function for a GLMM is not identity, normality of the transformed random errors in the linearization model needs to be retransformed to a nonnormal distribution as a prior when the nonlinear response is predicted. As a result, the expected value of the posterior predictive distribution, after retransformation to a nonnormal function, is not the direct retransformation of the expectation in the linear predictor unless the identity link function is specified.
Given the complex functional transformation and retransformation, in nonlinear predictions the above BLUPs are neither linear nor best; further computation is needed to derive the nonlinear, the unbiased, and the statistically best predictions. As
Kuk (1995) comments, in general situations, the BLUP estimates are asymptotically biased and inconsistent in nonlinear predictions. In some of the empirical illustrations in the succeeding chapters, the prediction bias from neglect of retransforming the random components in nonlinear predictions will be described and empirically displayed.
8.4.2. Empirical Bayes BLUP
As the random effects can be estimated or predicted accurately by the application of the Gaussian quadrature methods (McCulloch et al., 2008), the nonlinear response for subject
i in GLMMs can be predicted by using the estimates of the fixed effects and the predicted values of the specified random effects. Given Equation
(8.17), the linear predictor for subject
i can be expressed by the mean
ηˆi(yi∣∣βˆ,bˆi)=X′iβˆ+Z′ibˆi,
(8.65)
where
βˆ is the maximum or the REML estimate of
β, and
bˆi is the empirical Bayes estimate of
bi. The corresponding variance of the prediction is given by
Vˆi(ηˆi)=φˆAˆ1/2i(μi)Rˆi(a˜)Aˆ1/2i(μi)+Z′iGˆZ.
(8.66)
Suppose
ηˆi is a random vector of linear predictors from Equation
(8.65) with mean
ηi and variance matrix
Vˆi(ηˆi), and
yˆi=g−1(ηˆi) is a transform of
ηˆi where g is the link function and g
−1 is its inverse function. The first-order Taylor series expansion of
g−1(ηˆi) yields an approximation of mean
E[g−1(ηˆi)]≈g−1(ηi),
(8.67)
and the variance–covariance matrix
Vˆi(ηˆi)
Vi[g−1(ηˆi)]≈[∂ g−1(ηˆi)∂ ηˆi|ηˆi=ηi]′Vi(ηˆi)[∂ g−1(ηˆi)∂ ηˆi|ηˆi=ηi].
(8.68)
The above approximation approach uses the delta method (
Stuart and Ord, 1994), described in
Appendix B. Computation of the partial derivatives expressed in Equation
(8.68) depends on a specific link function corresponding to a specific data type, as will be described in some of the succeeding chapters.
This empirical Bayes BLUP is a popular approach for performing nonlinear predictions with GLMMs (Fitzmaurice et al., 2004), applied in a variety of computer software packages without various algorithms. For example, the SAS NLMIXED procedure uses the likelihood calculation given in
McGilchrist (1994), with an extension of the algorithm for nonlinear predictions. Specifically, this statistical procedure uses the empirical Bayes estimates of the random effects to approximate the Gaussian quadrature integral, and the approximated integral as a marginal likelihood is then used and optimized by iteration for the fixed effects. In the final iteration, the PROC NLMIXED procedure applies the fixed effect estimates to produce final predictions, considered the empirical Bayes estimates (for details, see the Chapter “The NLMIXED Procedure” in the SAS/STAT User’s Guide). In this algorithm, only the predicted random effects, used as the fixed effects, are taken into account in predictions, thereby implicitly leading to the restriction that the intraindividual correlation is one. As indicated in
Chapter 4, the term
GZ′Σ−1 in linear mixed models is empirically the proportion of the overall variance–covariance matrix that derives from the random effects. Given this inference, there should actually be two variance–covariance components,
R and
G, which need to be accounted for in nonlinear predictions with GLMMs.
8.4.3. Retransformation method
According to Equation
(8.15), the mean of the nonlinear response
y can be formulated as an iterated expectation:
E(yi)=E[g−1(X′iβ+Z′ib)].
(8.69)
If g(·) represents a log link, Equation
(8.69) can be rewritten as
E(yi)=E[exp(X′iβ+Z′ib)] =exp(X′iβ)E[exp(Z′ib)] =exp(X′iβ)Φb(Zi),
(8.70)
where
Φb(Zi) is defined as the moment-generating function of
b evaluated at
Zi (McCulloch et al., 2008). According to Bayesian inference,
Φb(Zi) is the expectation of the prior distribution for the random effects on the nonnormal
yi. Given
bi∼N(0,G), this moment-generating function of
b follows a lognormal distribution with the log link. Therefore, Equation
(8.70) can be further expanded for nonlinear predictions:
E(yˆi)=exp(X′iβˆ)exp(ZiGˆZ′i2),
(8.71)
where the term
exp{(ZijGˆZ′ij)/2} is the expectation of
exp(Z′ib) given a lognormal distribution.
Equation
(8.71) indicates that at data point
j,
exp(X′ijβˆ)exp{(ZijGˆZ′ij)/2}>exp(X′ijβˆ) unless all elements in
Gˆ take value zero. Therefore, given the log link, nonlinear response
yij will be underpredicted if retransformation of the between-subjects random effects is completely or partially neglected, with the magnitude of such retransformation bias depending on the value of the elements in
Gˆ. Furthermore, it is inappropriate to replace
Gˆ with the empirical BLUP covariance estimator
var(BLUP bˆi). Due to shrinkage of
bˆi toward the population fixed effects, the distribution of the empirical BLUPs does not accurately represent the distribution of the random effects (
McCulloch and Neuhaus, 2011).
Equation
(8.69) does not specify a term for within-subject random errors. The underlying rationale is that the specification of the between-subjects random effects fundamentally reflects the individual differences in unspecified characteristics thereby, addressing within-subject variability (
Diggle et al., 2002; Littell et al., 2006; Molenberghs and Verbeke, 2010). Neither the linearization-based BLUP nor the empirical-Bayes method specifies within-subject random errors in nonlinear predictions. While empirically this hypothesis holds in most cases, Equation
(8.69) does not universally reflect the true experiences generated by the stochastic longitudinal processes. There is evidence that within-subject variability can sometimes have a unique impact on the nonlinear response, thereby yielding some uncertainty, even conditionally on the between-subjects random effects (
Liu and Engel, 2012). In certain situations, a term for within-subject random errors needs to be specified in line with the specification in linear mixed models.
If within-subject variability is considered, the specification of nonlinear response y for person i can be empirically written as
E(yi)=E{g−1[X′iβ+Z′ib+Δ˜(yi−μi)]},
(8.72)
where
Δ˜ can be regarded as a second-order smearing estimate evaluated at
(βˆ,bˆi). This smearing effect can be approximated from the partial derivative of the log-likelihood function with respect to
β in GLMMs, given by
Δ˜ˆ(yij−μij)≈∂l(yij|β,G,φ)∂β∣∣∣βˆ,bˆ.
Empirically,
Δ˜ is the local approximation of within-subject random errors (McCulloch et al., 2008), which can be estimated as the first partial derivative of the log-likelihood function. It must be emphasized that the approximation of such a within-subject random error term is model-based, differing from the specification of random errors in linear mixed models. If the researcher has strong evidence that between-subjects variability well captures within-subject uncertainty, the specification of the above local approximation step is unnecessary.
Some researchers recommend the application of the latent variable approach (
Amemiya, 1985;
Bock, 1975;
Long, 1997) to estimate within-subject random errors in GLMMs (
Hedeker and Gibbons, 2006). In the analysis of multinomial data, this
random component on each logit function is assumed to follow a standard logistic distribution with mean 0 and variance
π2/3. This standardized approach specifies a constant variance of within-subject random errors, regardless of the response type and the number of covariates considered in a particular regression model. It is argued that, when the between-subjects random effects are specified, not that much variability remains in a binary or a multinomial response (Littell et al., 2006). Furthermore, this approach does not specify a covariance structure between two related logit components, thereby overlooking the multivariate nature of the multinomial response. Indeed, the assumption of a continuous latent distribution is just a rough model requirement (
McCullagh and Nelder, 1989).
Empirically, within-subject random errors can be locally approximated by the score equation, the first partial derivative of the log-likelihood in estimating a marginal mean function. If all covariates are rescaled to be centered at some specified values, represented by
X0, the intercept corresponds to a mean transformed linear function with respect to
X0, as applied in the regression analysis of correlated binary data (
Zhao and Prentice, 1990) and for predicting the mean function of recurrent events in survival analysis (Lin et al., 2000). For example, if time
T is centered at three and other covariates are rescaled to be centered at sample means, the intercept in a GLMM predicts the mean of the transformed linear function for an average person at the time point valued 3. Correspondingly, the score function approximates the within-subject residuals for this marginal mean conditionally on the between-subjects random effects and other model parameters. The variances–covariance elements of within-subject random errors can also be approximated by the local subset of the variance–covariance matrix for the fixed effects given the hypothesized structure of
R.
If within-subject random errors are considered in GLMMs, for a log link the mean of the nonnormal response y is given by
E(yi)=E[exp(X′iβ+Z′ib+Δ˜−1i)] =exp(X′iβ)Φb(Zi)exp[E(Δ˜−1i)] =exp(X′iβ)exp(ZiGZ′i+Ai2),
(8.73)
with variance generally specified as
var(yi)=var[E(yi|b)]+E[var(yi|b)] =var(μi)+E[φυ(μi)] =var[g−1(X′iβ+Z′ib)]+E{φυ[g−1(X′iβ+Z′ib)]}.
(8.74)
As can be identified, the variance of response y can be decomposed into two parts: the between-subjects and the within-subject components, as in the case of linear mixed models. In such nonlinear predictions, the value of the unobservable and the unrecognized factors is set at the mean by using the expectation of the posterior predictive distribution for the retransformed random effects, just as researchers have often set values of the control variables at sample means in regression modeling. When two variance components are specified, the marginal distribution of a nonlinear function is overdispersed compared to the conditional distribution (McCulloch et al., 2008).
The development of the retransformation method follows the rationale of Bayesian inference by predicting the marginal mean of the nonlinear response with the assumed distribution of an unknown parameter. This predictor is actually a nonlinear adaptation to least squares means applied for linear mixed models, accounting for the expectation of a nonnormal posterior predictive distribution for the random components. Dispersion of the individual development is configured by the standard error of the margin. In this approach, the random effects are specified to account for the inherent intraindividual correlation in longitudinal data, but the random effect values are not of direct interest. Therefore, in nonlinear predictions, the value of the random effects is held at its mean, just as values of the observed control variables are routinely fixed at sample means. In statistical modeling, individuals are the elements in a sample randomly selected for the estimation of population statistics, and therefore, any specific person in the sample just happens to be chosen from the population. With inference from a random sample to the population it represents, a prediction for a specific individual should follow a nonnormal probability distribution if the link function is not identity.