Quantifying Replicability in EFA

In other fields, researchers have been proposing methods of quantifying and summarizing replication analyses since the early 1950s. Although invariance analysis in CFA should be considered the gold standard for attempting to understand whether an instrument has the same factor structure across different groups (randomly constituted or otherwise), measures for replication in EFA are still necessary for exploratory purposes. Over the years, two summary statistics have been proposed for this function, but unfortunately (as we will discuss below) both have flaws. More recently, a two-step comparison procedure was proposed by the first author and one of his graduate students (Osborne & Fitzpatrick, 2012). We hope you will see the intuitive appeal of these procedures.
Two summary statistics for comparing EFA solutions. The use of similarity coefficients for investigating EFA solution equivalency was initially proposed by Kaiser, Hunka, & Bianchini (1971). The similarity coefficients are estimated as the maximized cosines when one set of factor loadings is rigidly rotated against another set of factor loadings. The final set of parameters is interpreted as correlation coefficients that indicate the similarity between the two sets of factor loadings. These coefficients are, however, based on faulty assumptions (and therefore are invalid from a mathematical point of view; see ten Berge (1996) and Barrett (1986)). It is currently possible to produce similarity coefficients that indicate strong agreement when in fact there is little agreement. Thus, these statistics are inappropriate for comparing factor analysis results.
Another EFA solution summary statistic, congruence coefficients, was presented by Tucker (1951) and Wrigley & Neuhaus (1955). These summary statistics seem less problematic (ten Berge, 1986) but are also controversial. (See also Barrett, 1986.) For example, Tucker’s (1951) congruence coefficient examines the correlations between factor loadings for all factor pairs extracted. Yet as Barrett (1986) correctly points out, these types of correlations are insensitive to the magnitude of the factor loadings, merely reflecting the patterns.[2] For our purpose, which is to examine whether the factor structure and magnitude of the loadings are generally congruent, this insensitivity to magnitude of loadings is problematic. We prefer a more granular analysis that examines (a) whether items are assigned to the same factors in both analyses and (b) whether the individual item factor loadings are roughly equivalent in magnitude—the former being the basic threshold for successful replication, and the latter being a more reasonable, stronger definition of replication.
A two-step procedure for comparing EFA solutions. As noted above, we have not yet found a single summary statistic that meets our standards to evaluate EFA solution equivalency. In 2012 the first author and one of his graduate students conducted a review of replication procedures in EFA and developed two-step comparison procedure for evaluating EFA solutions (Osborne & Fitzpatrick, 2012).
First, Osborne & Fitzpatrick (2012) assess whether the basic factor structure is replicated. Regardless of whether the researcher is performing internal (a single sample, randomly split) or external (two independently gathered samples) replication, the researcher needs to perform the same EFA procedure on both, specifying the same number of factors to be extracted, the same extraction and rotation procedures, etc. The researcher should then identify the strongest loading for each item (i.e., which factor does that item “load” on), and confirm that these are congruent across the two analyses. For example, if item #1 has the strongest loading on factor 1, and item #2 has the strongest loading on factor #2, that pattern should be evidenced in both analyses. If any items fail this test, we would consider these analyses to fail to meet the most basic threshold of replicability: structural replicability. There is therefore little reason to expect factor structure to replicate in any basic way in future samples.
If there is a small percentage of items that seem volatile in this way, this replication analysis might provide important information—that these items might need revision or deletion. Thus, replication can also serve important exploratory and developmental purposes. If a large number of problematic items are observed, this represents an opportunity for the researcher to revise the scale substantially before releasing it into the literature, where this volatility might be problematic.
Next, Osborne & Fitzpatrick (2012) evaluate whether the relative magnitude of the factor loadings is replicated. They advocate for simply subtracting the two standardized (rotated) factor loadings for congruent items, and squaring the difference. Squaring the difference has two benefits: eliminating non-important negative and positive values (if one loading is .75 and one is .70, subtracting the first from the second produces a -0.05, and subtracting the second from the first produces a 0.05, yet the direction of the difference is unimportant—only the magnitude is important) and highlighting larger differences. Researchers can then quickly scan the squared differences, and they can then either confirm that all are small and unimportant, or identify which items seem to have large differences across replication analyses.
As you might imagine, we find Osborne & Fitzpatrick’s procedure to be rather sensible. The procedure addresses both questions that we seek to address in a replication analysis: 1) Does the factor structure replicate? and 2) Are the factor loadings of similar magnitude?
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset