In other fields, researchers
have been proposing methods of quantifying and summarizing replication
analyses since the early 1950s. Although invariance analysis in CFA
should be considered the gold standard for attempting to understand
whether an instrument has the same factor structure across different
groups (randomly constituted or otherwise), measures for replication
in EFA are still necessary for exploratory purposes. Over the years,
two summary statistics have been proposed for this function, but unfortunately
(as we will discuss below) both have flaws. More recently, a two-step
comparison procedure was proposed by the first author and one of his
graduate students (Osborne & Fitzpatrick, 2012). We hope you will
see the intuitive appeal of these procedures.
Two
summary statistics for comparing EFA solutions. The use
of similarity coefficients for investigating EFA solution equivalency
was initially proposed by Kaiser, Hunka, & Bianchini (1971). The
similarity coefficients are estimated as the maximized cosines when
one set of factor loadings is rigidly rotated against another set
of factor loadings. The final set of parameters is interpreted as
correlation coefficients that indicate the similarity between the
two sets of factor loadings. These coefficients are, however, based
on faulty assumptions (and therefore are invalid from a mathematical
point of view; see ten Berge (1996) and Barrett (1986)). It is currently
possible to produce similarity coefficients that indicate strong agreement
when in fact there is little agreement. Thus, these statistics are
inappropriate for comparing factor analysis results.
Another EFA solution
summary statistic, congruence coefficients, was presented by Tucker
(1951) and Wrigley & Neuhaus (1955). These summary statistics
seem less problematic (ten Berge, 1986) but are also controversial.
(See also Barrett, 1986.) For example, Tucker’s (1951) congruence
coefficient examines the correlations between factor loadings for
all factor pairs extracted. Yet as Barrett (1986) correctly points
out, these types of correlations are insensitive to the magnitude
of the factor loadings, merely reflecting the patterns. For
our purpose, which is to examine whether the factor structure and
magnitude of the loadings are generally congruent, this insensitivity
to magnitude of loadings is problematic. We prefer a more granular
analysis that examines (a) whether items are assigned to the same
factors in both analyses and (b) whether the individual item factor
loadings are roughly equivalent in magnitude—the former being
the basic threshold for successful replication, and the latter being
a more reasonable, stronger definition of replication.
A
two-step procedure for comparing EFA solutions. As noted
above, we have not yet found a single summary statistic that meets
our standards to evaluate EFA solution equivalency. In 2012 the first
author and one of his graduate students conducted a review of replication
procedures in EFA and developed two-step comparison procedure for
evaluating EFA solutions (Osborne & Fitzpatrick, 2012).
First, Osborne &
Fitzpatrick (2012) assess whether the basic factor structure is replicated.
Regardless of whether the researcher is performing internal (a
single sample, randomly split) or external (two
independently gathered samples) replication, the researcher needs
to perform the same EFA procedure on both, specifying the same number
of factors to be extracted, the same extraction and rotation procedures,
etc. The researcher should then identify the strongest loading for
each item (i.e., which factor does that item “load”
on), and confirm that these are congruent across the two analyses.
For example, if item #1 has the strongest loading on factor 1, and
item #2 has the strongest loading on factor #2, that pattern should
be evidenced in both analyses. If any items fail this test, we would
consider these analyses to fail to meet the most basic threshold of
replicability: structural replicability. There is therefore little
reason to expect factor structure to replicate in any basic way in
future samples.
If there is a small
percentage of items that seem volatile in this way, this replication
analysis might provide important information—that these items
might need revision or deletion. Thus, replication can also serve
important exploratory and developmental purposes. If a large number
of problematic items are observed, this represents an opportunity
for the researcher to revise the scale substantially before releasing
it into the literature, where this volatility might be problematic.
Next, Osborne &
Fitzpatrick (2012) evaluate whether the relative magnitude of the
factor loadings is replicated. They advocate for simply subtracting
the two standardized (rotated) factor loadings for congruent items,
and squaring the difference. Squaring the difference has two benefits:
eliminating non-important negative and positive values (if one loading
is .75 and one is .70, subtracting the first from the second produces
a -0.05, and subtracting the second from the first produces a 0.05,
yet the direction of the difference is unimportant—only the
magnitude is important) and highlighting larger differences. Researchers
can then quickly scan the squared differences, and they can then either
confirm that all are small and unimportant, or identify which items
seem to have large differences across replication analyses.
As you might imagine,
we find Osborne & Fitzpatrick’s procedure to be rather
sensible. The procedure addresses both questions that we seek to address
in a replication analysis: 1) Does the factor structure replicate?
and 2) Are the factor loadings of similar magnitude?