Unfortunately,
much of the literature that has attempted to address sample size guidelines
for EFA, particularly the studies attempting to dismiss subject to
item ratios, use flawed data. We will purposely not cite studies
here to protect the guilty, but consider it sufficient to say that
many of these studies either tend to use highly restricted ranges
of subject to item ratios or fail to adequately control for or vary
other confounding variables (e.g., factor loadings, number of items
per scale or per factor/component) or restricted range of N. Some
of these studies purporting to address subject to item ratio fail
to actually test subject to item ratios in their
analyses.
Researchers seeking
guidance concerning sufficient sample size in EFA are left between
two entrenched camps—those arguing for looking at total sample
size and those looking at ratios. This
is unfortunate, because both probably matter in some sense, and ignoring
either one can have the same result: errors of inference. Failure
to have a representative sample of sufficient size results in unstable
loadings (Cliff, 1970), random, nonreplicable factors (Aleamoni, 1976;
Humphreys, Ilgen, McGrath, & Montanelli, 1969), and lack of generalizability
to the population (MacCallum, Widaman, Zhang, & Hong, 1999).
If one were to take
either set of guidelines (e.g., 10:1 ratio or a minimum N of 400 to
500) as reasonable guidelines, a casual perusal of the published literature
shows that a large portion of published studies come up short. One
can easily find articles reporting results from EFA or PCA based on
samples with fewer subjects than items or parameters estimated that
nevertheless draw substantive conclusions based on these questionable
analyses. Many more have hopelessly insufficient samples by either
guideline.
One survey by
Ford, MacCallum, and Tait (1986) examined common practice in factor
analysis in industrial and organizational psychology during the ten-year
period of 1974 to 1984. They found that out of 152 studies using EFA
or PCA, 27.3% had a subject to item ratio of less than 5:1 and 56%
had a ratio of less than 10:1. This matches the perception that readers
of social science journals get, which is that often samples are too
small for the analyses to be stable or generalizable.
Osborne and colleagues published the results
of a survey of current practices in the social sciences literature
(Osborne, Costello, & Kellow, 2008). In this survey, they sampled
from two years’ (2002, 2003) worth of articles archived in
PsycINFO that reported some form of EFA and listed both the number
of subjects and the number of items analyzed (303 total articles surveyed).
They standardized their sample size data via a subject to item ratio.
The results of this survey are summarized in
Current practice in factor analysis in 2002-2003 psychology journals. A large percentage
of researchers report factor analyses using relatively small samples.
In a majority of the studies (62.9%) researchers performed analyses
with subject to item ratios of 10:1 or less. A surprisingly high proportion
(almost one-sixth) reported factor analyses based on subject to item
ratios of only 2:1 or less (note that in this case there would be
more parameters estimated than subjects if more than one factor is
extracted).
Table 5.1 Current practice in factor analysis in 2002-2003 psychology
journals
Subject to item ratio
|
% of studies
|
Cumulative %
|
2:1 or less
|
14.7%
|
14.7%
|
> 2:1, ≤ 5:1
|
25.8%
|
40.5%
|
> 5:1, ≤ 10:1
|
22.7%
|
63.2%
|
> 10:1, ≤
20:1
|
15.4%
|
78.6%
|
> 20:1, ≤100:1
|
18.4%
|
97.0%
|
> 100:1
|
3.0%
|
100.0%
|
A more recent survey of EFA practices in four psychological
journals, Educational and Psychological Measurement, Journal
of Educational Psychology, Personality
and Individual Differences, and Psychological
Assessment, identifies similar trends. Among the
60 studies reviewed, Henson and Roberts’ (2006) found a median
sample size of 267 for reported EFAs, a mean subject to item ratio
of 11, and a median of 60 parameters (20 items x 3 factors) estimated.
As you will see below, these are not comforting statistics. Given
the stakes and the empirical evidence on the consequences of insufficient
sample size, this is not exactly a desirable state of affairs.