Factor scores
seemed to be cutting-edge back in the 1980s when the first author
was beginning to take graduate statistics courses. The concept and
practice extend back to the 1920s, although early on it was considered
much less important than the determination of the actual factors themselves. This
practice developed as a rational response to the early practice of
summing or averaging items on a scale to produce an overall score.
Of course, summing or averaging items assumes that all items contribute
equally to the latent score. EFA demonstrated convincingly very early
on that not all items are equally good measures of a construct and,
therefore, better items should carry more weight in estimating an
individual’s score.
The practice of weighting
items according to the results of EFA analyses held the potential
of improving measurement. Researchers working prior to wide scale
access to computers were then able to sum the weighted item scores
to approximate what might be the true score of each individual on
the construct being examined. Thus,
the computation of weighted factor scores makes sense and was a natural
progression over a period of many decades of research. However, it
is important to recognize the intricate nature of the factor score
computation. There are many methods of computing factor scores and
no single universal technique. In addition, factor scores can be sensitive
to the extraction and rotation methods used (DiStefano, Zhu, &
Mindrila, 2009).
The practice of factor
score estimation introduced a new assumption into our analyses: that
the factor loadings (correlations between an item and the factor)
are stable and reasonable estimates of the dynamics within a population.
A corollary, of course, is that the factor loadings are valid (invariant)
for all subgroups being examined. A similar set of assumptions and
corresponding issues is present in linear modeling /regression analyses,
particularly when researchers are using multiple regression to predict
outcomes for individuals. (For papers on prediction in regression,
see Osborne, 2000, 2008.) Specifically, the issue is that these procedures
can overfit the data. Most
samples contain idiosyncratic aspects that most quantitative analyses
will take advantage of in order to fit the model, despite those sample
characteristics being nonreproducible (Thompson, 2004, p. 70). Because
of this issue, we recommended in earlier chapters that we focus on
replication and evaluate the anticipated precision or variability
across samples to evaluate the goodness of EFA results. However, factor
score estimation can be considered a relatively common practice and
thus a book about EFA might not be complete without mentioning this
practice. Therefore, here are our goals for this chapter:
-
Review some basic aspects
of computing factor scores.
-
Review how the solutions
that result from EFA can be unstable across samples, which can contribute
to instability of factor scores.