Exploratory factor analysis is meant to be exploratory
in nature, and thus it is not desirable to prescribe a rigid formula
or process for executing an EFA. The steps below are meant to be a
loose guide, understanding that a factor analysis often requires returning
to previous steps and trying other approaches to ensure the best outcome.
The general pattern of performing an EFA falls into six general steps
that will guide the discussion through the rest of the book:
-
-
Deciding on an extraction
method to use
-
Deciding how many factors
to retain
-
Deciding on a method
of rotation (if desired)
-
Interpreting results
(return to #3 if a solution is not ideal)
-
Replication or evaluation
of robustness (return to the beginning if a solution
is not replicable or robust)
Step 1: Data cleaning. Without
clean data, what follows in almost any analysis is moot. This is another
point where passions run high among researchers and statisticians
because there is considerable controversy about any manipulations
of the sample and data (e.g., how to treat outliers, missing data).
We have a clear position on the issue—data should be cleaned
and issues (e.g., failing to meet assumptions) should be addressed.
The first author wrote an entire book on the topic, in which he demonstrated
repeatedly how clean data produces results that are better estimates
of population parameters and, therefore, more accurate and replicable
(Osborne, 2013). Instead of debating the point here, allow me to assert
that data that is filled with errors or that fails to meet assumptions
of the analysis being performed is likely to lead to poorer outcomes
than data that is free of egregious errors and that meets assumptions.
We will discuss some other data quality issues later in the book,
including the importance of dealing appropriately with missing data.
Step 2: Deciding on an extraction method. An
extraction technique is one of a group of methods that examines the
correlation/covariation between all the variables and seeks to “extract”
the latent variables from the measured/manifest variables.
There are several
factor analysis extraction methods to choose from. SAS has seven EFA
extraction methods: unweighted least squares (ULS),
maximum likelihood (ML), principal axis factoring (PAF), iterated
principal axis factoring (iterated PAF), alpha factoring, image factoring,
and Harris factoring. Information about the relative strengths and weaknesses
of these techniques is not easy to obtain. To complicate matters further,
naming conventions for some extraction techniques are not consistent,
leaving it difficult to figure out which method a textbook or journal
article author is describing, and whether or not it is actually available
in the software the researcher is using. This probably explains the
popularity of principal components analysis – not only is it
the default in much statistical software, but it is one of the more
consistent names researchers will see there.
An article
by Fabrigar, Wegener, MacCallum and Strahan (1999) argued that if
data is relatively normally distributed, maximum likelihood is the
best choice because “it allows for the computation of a wide
range of indexes of the goodness of fit of the model [and] permits
statistical significance testing of factor loadings and correlations
among factors and the computation of confidence intervals.”
(p. 277). If the assumption of multivariate normality is “severely
violated” they recommend iterated PAF or ULS factoring (Fabrigar
et al., 1999; Nunnally & Bernstein, 1994). Other authors have
argued that in specialized cases, or for particular applications,
other extraction techniques (e.g., alpha extraction) are most appropriate,
but the evidence of advantage is slim. In general, ML, iterated PAF,
or ULS will give you the best results, depending on whether your data
is generally normally distributed or significantly non-normal. In
Chapter 2, we will compare outcomes between the various factor extraction
techniques.
Step 3: Deciding how many factors to retain
for analysis. This, too, is an issue that suffers from
anachronistic ideas and software defaults that are not always ideal
(or even defensible). In this step, you (or the software) decide how
many factors you are going to keep for analysis. The statistical software
will always initially extract as many factors as there are variables
(i.e., if you have 10 items in a scale, your software will extract
10 factors) in order to account for 100% of the variance. However,
most of them will be meaningless. Remembering that the goal of EFA
is to explore your data and reduce the
number of variables being dealt with. There are several ways of approaching
the decision of how many factors to extract and keep for further analysis.
Our guide will always focus on the fact that extracted factors should
make conceptual and theoretical sense, and be empirically defensible.
We will explore guidelines for this later in Chapter 3.
Step 4: Deciding
on a rotation method and rotating the factors. Rotation
is often a source of some confusion. What exactly is rotation and
what is happening when data is rotated? In brief, the goal is to clarify
the factor structure and make the results of your EFA most interpretable.
There are several rotation methodologies, falling into two general
groups: orthogonal rotations and oblique rotations. Orthogonal rotations
keep axes at a 90° angle, forcing the factors to be uncorrelated.
Oblique rotations allow angles that are not 90°, thus allowing
factors to be correlated if that is optimal for the solution. We argue
that in most disciplines constructs tend to be at least marginally
correlated with each other, and, as such, we should focus on oblique
rotations rather than orthogonal. We will discuss these options in
more detail in Chapter 4.
Step 5: Interpreting results. Remember
that the goal of exploratory factor analysis is to explore whether
your data fits a model that makes sense. Ideally, you have a conceptual
or theoretical framework for the analysis—a theory or body
of literature guiding the development of an instrument, for example.
Even if you do not, the results should be sensible in some way. You
should be able to construct a simple narrative describing how each
factor, and its constituent variables, makes sense and is easily labeled.
It is easy to get EFA to produce results. It is much harder to get
sensible results.
Note also that EFA is
an exploratory technique. As
such, it should not be used, as many researchers do, in an attempt
to confirm hypotheses or test
competing models. That is what confirmatory factor
analysis is for. It is a misapplication of EFA
to use it in this way, and we need to be careful to avoid confirmatory
language when describing the results of an exploratory factor analysis.
If your results do not
make sense, it might be useful to return to an earlier step. Perhaps
if you extract a different number of factors, the factors or solution
will make sense. This is why it is an exploratory technique.
Step 6: Replication of results. One
of the hallmarks of science is replicability,
or the ability for other individuals, using the same materials or
methods, to come to the same conclusions. We have not historically
placed much emphasis on replication in the social sciences, but we
should. As you will see in subsequent chapters, EFA is a slippery
technique, and the results are often not clear. Even clear results
often do not replicate exactly, even within an extremely similar data
set. Thus, in our mind, this step is critical. If the results of your
analysis do not replicate (or do not reflect the true nature of the
variables in the “real world”), then why should anyone
else care about them? Providing evidence that your factor structure
is likely to replicate (either through another EFA or through CFA)
makes your findings stronger and more relevant. In Chapter 6, we will
explore a “traditional” method of replication (similar to cross validation in
regression models). In Chapter 7, we will play with the notion of
applying a less traditional but perhaps more useful analysis using
bootstrap analysis. Confirmatory factor analysis is outside the scope
of this book, but is perhaps an even better method of replication.