Targeted Learning for Variable Importance 419
The TMLE requires choosing a loss function L(O, Q) for candidate function Q applied
to an observation O and then specifying a submodel {Q():}⊂Mto fluctuate the initial
estimator. Here, we use the squared-error loss function:
L(O, Q)=
(Y − Q(A, W ))
2
σ
2
(A, W )
The submodel {Q():}⊂Mthrough Q at = 0 is selected such that the linear span of
d/d L(Q()) at = 0 includes the efficient influence curve in Equation 22.2. The specific
steps of the TMLE algorithm for the target parameter β
0
are enumerated below.
22.3.2.1 TMLE Algorithm for Quantitative Trait Loci Mapping
Estimating E
0
(Y | A, M
−
)=Q
0
(A, M
−
). Generate a super learner-based initial esti-
mator that respects the semiparametric model in Equation 22.1 and also takes the form
Q
0
n
= β
0
n
A + f
n
(M
−
)
We introduce the subscript n to denote estimators and estimates.
Estimating E
0
(A | W )=g
0
(W ). Recall that we introduced a subset W of M
−
for each
A.Thus,M
−
is replaced with W and we can refer to the function g
0
(W )=E
0
(A | W )
as a marker confounding mechanism. For the applications considered here, as in Wang et
al. [66], the set of markers W are those that lie on the same chromosome as A.
However, the choice for E
0
(A | W ), in general, is still a complicated one. The selection
of flanking markers to include in the marker confounding mechanism can be further
simplified to including only two flanking markers, possibly capturing a good portion of
the confounding. But, there is still then the issue of distance for A for the selection of
these two flanking markers. Those that are too close to A may be too predictive of A,
thus failing to isolate the contribution of A when estimating β
0
. On the other hand, if the
selected markers are too great a distance from A, they may not contribute to reducing
bias for the target parameter of interest. Collaborative TMLE, as discussed briefly in our
literature review, may also be employed to data-adaptively select the most appropriate
adjustment set. We leave further discussion of this issue to other literature [49,63].
Determine parametric working model to fluctuate initial estimator. The targeted
step uses an estimate g
n
(W )ofg
0
(W ) to correct the bias remaining in the initial estimator.
This involves defining a so-called clever covariate in a parametric working model coding
fluctuations of our initial estimator Q
0
n
. For our parameter β
0
, the clever covariate is
given by
h(A, W )=A − g
n
(W )
the residual of g
n
(W ), under a condition we describe below.
The clever covariate h(A, W ) was defined earlier in Equation 22.3 and derived based on
the efficient influence curve in Equation 22.2. When σ
2
(A, W ) is a function of W only,
it drops out of the efficient influence curve. We choose to estimate σ
2
(A, W )withthe
constant 1, which gives us the simplified clever covariate h(A, W )=A −g
n
(W )asabove.
The estimation of the nuisance parameter σ
2
(A, W ) does not impact the consistency
properties of the TMLE, but TMLE will only be efficient if, in addition to estimating Q
0
and g
0
consistently, σ
2
(A, W ) is in fact only a function of W [49].
Update Q
0
n
. The regression of Y on h(A, W ) can be reformulated as
Y
∼ h(A, W )