As you
will come to learn, EFA is quite different from PCA. Unfortunately,
there are many misconceptions about the two analyses, and one of the
biggest is that PCA is part of, or synonymous with, EFA. This misconception
probably has modern day roots from at least two factors:
-
Statistical software,
including SAS, has PCA as the default extraction technique when performing
exploratory factor analysis.
-
Many modern researchers
use PCA and EFA interchangeably, or use PCA when performing an analysis
that is more appropriate for EFA.
Although the two methods generally seem to
do the same thing, they are different in some key ways. Principal
components analysis is a computationally simplified version of a general
class of dimension reduction analyses. EFA was developed before PCA
(Hotelling, 1933), thanks to Spearman (1904). EFA was developed prior
to the computer age when all statistical calculations were done by
hand, often using matrix algebra. As such, these were significant
undertakings requiring a great deal of effort. Because of the substantial
effort required to perform EFA with hand calculations, significant
scholarship and effort went into developing PCA as a legitimate alternative
that was less computationally intense but that also provided similar
results (Gorsuch, 1990). Computers became available to researchers
at universities and industrial research labs later in the 20th century,
but remained relatively slow and with limited memory until very late
in the 20th century (about the time the first author was in graduate
school using mainframes at the university). Our commentary on PCA
is not intended to slight these scholars nor to minimize their substantial
contributions, but rather to attempt to put PCA and EFA into context
for the modern statistician and quantitative researcher. We will therefore
focus on EFA, despite the popularity of PCA.
Without getting into
the technical details, which are available in other scholarly references
on the topic, PCA computes the analysis without regard to the underlying
latent structure of the variables, using all the variance in the manifest
variables. What this means is that there is a fundamental assumption
made when choosing PCA: that the measured variables are themselves
of interest, rather than some hypothetical latent construct (as in
EFA). This makes PCA similar to multiple regression in some ways,
in that it seeks to create optimized weighted linear combinations
of variables.
An example of a PCA model
to extract two factors is presented in
Figure 1.1 Conceptual overview of principal components analysis. We have already
conducted some initial analyses (to be discussed in Chapter 3) that
have convinced us of this two-component structure and led us to this
model. Note that all PCA and EFA analyses extract as many components
or factors as there are manifest variables, although not all are retained
for interpretation; here we simplify for clarity to examine the first
two components extracted and to see how they relate to the measured
variables. Now the important thing to note in this figure is the direction
of the arrows. Notice that they point from the variables to the components.
This is because each component is formed as a
weighted
linear combination of the predictor variables. One-hundred
percent of what is in those variables ends up becoming the components.
As analysts, we can then review the results and identify the primary
component each variable loads on to create scales, we can create component
scores, or we can do whatever else we would like with the results;
but the components themselves are completely defined by the variables.
In this way, principal components analysis combines manifest (observed)
variables into components.
Exploratory factor analysis,
on the other hand, is a group of extraction and rotation techniques
that are all designed to model unobserved or latent constructs. It
is referred to as common factor analysis or exploratory factor analysis.
EFA assumes and asserts that there are latent variables that give
rise to the manifest (observed) variables, and the calculations and
results are interpreted very differently in light of this assumption.
You can see this very
different conceptual vision of the same two-factor model reviewed
above in
Figure 1.2 Conceptual overview of exploratory factor analysis below. Notice
the changed direction of the arrow between the variables and factors
as well as the addition of error terms for each variable. Factor analysis
recognizes that model variance contains both shared and unique variance
across variables. EFA examines only the shared variance from the model
each time a factor is created, while allowing the unique variance
and error variance to remain in the model. The factors are then created
as weighted linear combinations of the shared variance. When the factors
are uncorrelated and communalities are moderate, PCA can produce inflated
values of variance accounted for by the components (Gorsuch, 1997;
McArdle, 1990). Since factor analysis analyzes only shared variance,
factor analysis should yield the same general solution (all other
things being equal) while also avoiding the illegitimate inflation
of variance estimates.
There are two other
issues with PCA that we will briefly note. First, PCA assumes that
all variables are measured without error (an untenable assumption
in almost any discipline), whereas EFA offers the option of acknowledging
less than perfect reliability. Second, PCA parameters are selected
in an attempt to reproduce sample, rather than population, characteristics
(Thompson, 2004).
Thus, we have many similarities between
PCA and some important conceptual and mathematical differences. Most
authors agree that there is little compelling reason to choose PCA
over other extraction methods, and that PCA can be limited and provide
biased parameter estimates. Such a list of authors would include Bentler
& Kano, 1990; Floyd & Widaman, 1995; Ford, MacCallum, &
Tait, 1986; Gorsuch, 1990; Loehlin, 1990; MacCallum & Tucker,
1991; Mulaik, 1990; Widaman, 1993. If one is to seek best practices,
one is hard pressed to conclude PCA is ever a best practice. Widman
(1993) puts it very bluntly: “principal components analysis
should not be used if a researcher wishes to obtain parameters reflecting
latent constructs or factors.” (p. 263). Unfortunately, it
is still the default dimension reduction procedure in much statistical
analysis software, even though it is usually not (in our opinion)
the conceptually desirable choice, and usually has no clear advantage
in modern quantitative methodology that we can detect.
This is a topic that
arouses passions among statisticians, and the first author has rarely
published a paper or given a talk on this topic without someone getting
upset for taking this position so clearly and unapologetically. So
let us sidestep this issue for the moment and summarize: PCA is not
considered a factor analytic technique, and there is disagreement
among statisticians about when it should be used, if at all. More
often than not, researchers use PCA when EFA would be appropriate
and preferable (for example, see Ford et al., 1986; Gorsuch, 1983;
Widaman, 1993).