EFA vs PCA

As you will come to learn, EFA is quite different from PCA. Unfortunately, there are many misconceptions about the two analyses, and one of the biggest is that PCA is part of, or synonymous with, EFA. This misconception probably has modern day roots from at least two factors:
  1. Statistical software, including SAS, has PCA as the default extraction technique when performing exploratory factor analysis.
  2. Many modern researchers use PCA and EFA interchangeably, or use PCA when performing an analysis that is more appropriate for EFA.
Although the two methods generally seem to do the same thing, they are different in some key ways. Principal components analysis is a computationally simplified version of a general class of dimension reduction analyses. EFA was developed before PCA (Hotelling, 1933), thanks to Spearman (1904). EFA was developed prior to the computer age when all statistical calculations were done by hand, often using matrix algebra. As such, these were significant undertakings requiring a great deal of effort. Because of the substantial effort required to perform EFA with hand calculations, significant scholarship and effort went into developing PCA as a legitimate alternative that was less computationally intense but that also provided similar results (Gorsuch, 1990). Computers became available to researchers at universities and industrial research labs later in the 20th century, but remained relatively slow and with limited memory until very late in the 20th century (about the time the first author was in graduate school using mainframes at the university). Our commentary on PCA is not intended to slight these scholars nor to minimize their substantial contributions, but rather to attempt to put PCA and EFA into context for the modern statistician and quantitative researcher. We will therefore focus on EFA, despite the popularity of PCA.
Without getting into the technical details, which are available in other scholarly references on the topic, PCA computes the analysis without regard to the underlying latent structure of the variables, using all the variance in the manifest variables. What this means is that there is a fundamental assumption made when choosing PCA: that the measured variables are themselves of interest, rather than some hypothetical latent construct (as in EFA). This makes PCA similar to multiple regression in some ways, in that it seeks to create optimized weighted linear combinations of variables.
Figure 1.1 Conceptual overview of principal components analysis
An example of a PCA model to extract two factors is presented in Figure 1.1 Conceptual overview of principal components analysis. We have already conducted some initial analyses (to be discussed in Chapter 3) that have convinced us of this two-component structure and led us to this model. Note that all PCA and EFA analyses extract as many components or factors as there are manifest variables, although not all are retained for interpretation; here we simplify for clarity to examine the first two components extracted and to see how they relate to the measured variables. Now the important thing to note in this figure is the direction of the arrows. Notice that they point from the variables to the components. This is because each component is formed as a weighted linear combination [1] of the predictor variables. One-hundred percent of what is in those variables ends up becoming the components. As analysts, we can then review the results and identify the primary component each variable loads on to create scales, we can create component scores, or we can do whatever else we would like with the results; but the components themselves are completely defined by the variables. In this way, principal components analysis combines manifest (observed) variables into components.
Exploratory factor analysis, on the other hand, is a group of extraction and rotation techniques that are all designed to model unobserved or latent constructs. It is referred to as common factor analysis or exploratory factor analysis. EFA assumes and asserts that there are latent variables that give rise to the manifest (observed) variables, and the calculations and results are interpreted very differently in light of this assumption.
You can see this very different conceptual vision of the same two-factor model reviewed above in Figure 1.2 Conceptual overview of exploratory factor analysis below. Notice the changed direction of the arrow between the variables and factors as well as the addition of error terms for each variable. Factor analysis recognizes that model variance contains both shared and unique variance across variables. EFA examines only the shared variance from the model each time a factor is created, while allowing the unique variance and error variance to remain in the model. The factors are then created as weighted linear combinations of the shared variance. When the factors are uncorrelated and communalities are moderate, PCA can produce inflated values of variance accounted for by the components (Gorsuch, 1997; McArdle, 1990). Since factor analysis analyzes only shared variance, factor analysis should yield the same general solution (all other things being equal) while also avoiding the illegitimate inflation of variance estimates.
Figure 1.2 Conceptual overview of exploratory factor analysis
There are two other issues with PCA that we will briefly note. First, PCA assumes that all variables are measured without error (an untenable assumption in almost any discipline), whereas EFA offers the option of acknowledging less than perfect reliability. Second, PCA parameters are selected in an attempt to reproduce sample, rather than population, characteristics (Thompson, 2004).
Thus, we have many similarities between PCA and some important conceptual and mathematical differences. Most authors agree that there is little compelling reason to choose PCA over other extraction methods, and that PCA can be limited and provide biased parameter estimates. Such a list of authors would include Bentler & Kano, 1990; Floyd & Widaman, 1995; Ford, MacCallum, & Tait, 1986; Gorsuch, 1990; Loehlin, 1990; MacCallum & Tucker, 1991; Mulaik, 1990; Widaman, 1993. If one is to seek best practices, one is hard pressed to conclude PCA is ever a best practice. Widman (1993) puts it very bluntly: “principal components analysis should not be used if a researcher wishes to obtain parameters reflecting latent constructs or factors.” (p. 263). Unfortunately, it is still the default dimension reduction procedure in much statistical analysis software, even though it is usually not (in our opinion) the conceptually desirable choice, and usually has no clear advantage in modern quantitative methodology that we can detect.
This is a topic that arouses passions among statisticians, and the first author has rarely published a paper or given a talk on this topic without someone getting upset for taking this position so clearly and unapologetically. So let us sidestep this issue for the moment and summarize: PCA is not considered a factor analytic technique, and there is disagreement among statisticians about when it should be used, if at all. More often than not, researchers use PCA when EFA would be appropriate and preferable (for example, see Ford et al., 1986; Gorsuch, 1983; Widaman, 1993).
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset