Now that we are familiar with the different extraction
criteria and the SAS syntax and output for each criterion, let’s
examine the potential differences between the criteria. We will review
these criteria in each of our three data sets and see how the criteria
agree or disagree.
Engineering data. We
examined each of the criteria for this data in the section above.
The theory, the Kaiser Criterion, the minimum eigenvalue, and the
proportion of variance recommend a two-factor solution; the scree
plot recommends a two-factor solution, but also identifies a possible
three-factor solution; and both MAP and parallel analysis recommend
a two-factor solution. In general, when we use a scale that generally
has a strong two-factor structure, the methods tend to conclude that
two factors is the optimal number to extract. Given that the results
make sense in the context of the theoretical model, we would likely
extract two factors for rotation and interpretation.
Before we examine these criteria in the next set of data,
let us briefly review how a two-factor extraction of this data affects
the subsequent factor loadings. We will examine the unrotated factor
loading since we have not yet discussed rotation. The loadings are
plotted in
Figure 3.8 Factor loading plot, iterated PAF extraction (we
requested this plot by adding the
INITLOADINGS
plot
to the
PLOTS
option,
plots =
(
initloadings
scree)
). As you can see from
the factor loading plot, the data is clustered into two clear and
distinct factors. In our opinion, this is about as good as it gets
in exploratory factor analysis at this stage of the analysis.
Self-description
questionnaire data. Because the data from the SDQ seemed
to be relatively well-behaved across the initial extraction comparisons,
we will use ML extraction and explore whether our expected three-factor
solution is tenable. Syntax to produce all of results to examine the
extraction criteria for this data set is presented below.
Ods graphics on;
proc factor data = sdqdata method = ML plots = SCREE;
var Eng: Math: Par:;
run;
ods graphics off;
*Assuming the include statement has already read in the MACROs;
%map(sdqdata);
%parallel(sdqdata,100,95,2,2,99);
The initial eigenvalues
are presented in
Figure 3.9 Eigenvalues for SDQ data,
and the scree plot is presented in
Figure 3.10 Scree plot for SDQ data. Theory, the
Kaiser Criterion, the minimum eigenvalue, and the proportion of variance
suggest the data has a three-factor solution. The scree plot, however,
is a little less clear. Scree plots do not always have one clear elbow.
In this case, it is possible to argue that any one of several points
is the true “elbow” — including 2, 3, 4, or 5.
In this example, the scree plot is not terribly informative.
In summary, theory and
the Kaiser Criterion recommend three factors, the scree plot recommends
between two and five factors, parallel analysis recommends four, and
MAP recommends three factors. As Velicer and others have argued, the
MAP appears to be, at least in this case, more congruent with theory
and eigenvalues. That is reassuring. The three-factor model seems
to be the best recommendation as it makes for a strong, interpretable
model.
Geriatric
depression scale data. This scale provided a very different
counterpoint to the clear conceptually consistent results of the engineering
and SDQ data. This scale was designed to have five subscales originally, so theory would suggest that there are five
factors. But as with many of our endeavors in the social sciences,
this might not hold true when put to the test. For example, it is
just as likely that all items will load as a single factor, or that
a different number of factors will be ideal. We used a ULS extraction
for this data based on results from Chapter 2. The syntax to examine
the extraction criteria for this data set is presented below. Please
note that PROC
FACTOR
automatically
deletes records with missing data from analysis, thereby reducing
our sample of 656 down to 479. The MAP and parallel analysis macros
cannot handle missing data, so we must delete these records beforehand.
We will discuss the issue of missing data further in Chapter 8.
ods graphics on;
proc factor data = marshdata method = ULS plots = SCREE;
var GDS: ;
run;
ods graphics off;
*Assuming the include statement has already read in the MACROS;
*First delete any cases with a missing value so this will run for the
purpose of our example;
*In practice, missing data need to be dealt with beforehand;
data marsh_nomiss(drop = i);
set marshdata;
array var(*) _NUMERIC_;
do i=1 to dim(var);
if var(i)=. then delete;
end;
run;
%map(marsh_nomiss);
%parallel(marsh_nomiss,100,95,2,2,99);
The initial eigenvalues
are presented in
Figure 3.15 Eigenvalues for GDS data,
and the scree plot is presented in
Figure 3.16 Scree plot of GDS data. Theory and
the proportion of variance criterion would suggest this data has a
five-factor structure. The Kaiser Criterion identifies three factors
with eigenvalues greater than 1.0, and the minimum eigenvalue criteria
identify seven factors with eigenvalues above the average (0.36).
And finally, the scree plot is not very clear. The scree plot seems
to indicate that the first inflection point is at two factors, but
it is also arguable that there is a second inflection point at the
fourth factor. Thus, using these three traditional criteria, we would
probably combine these results and test for a variety of configurations
including one-, two-, three-, four-, and five-factor extraction. These
results would be examined to see whether the original theoretical
framework made sense, or if any of the other factor structures seem
to make sense. However, since we have parallel analysis and MAP analysis,
let us examine those results before exploring these options.
The parallel
analysis results are presented in
Figure 3.17 Parallel analysis output for GDS data and
Figure 3.18 Parallel analysis plot of GDS data and the MAP
analysis results are presented in
Figure 3.19 MAP output for GDS data and
Figure 3.20 Plot of average partial correlations for MAP test. The results
of this parallel analysis pose an interesting dilemma, as the eigenvalues
quickly drop below 1 in the raw data analysis, and quickly approach
the random data eigenvalues. However, it is not until around the eighth
factor that the lines meet. This would suggest that we should extract
many more factors than probably makes sense. The MAP analysis was
also interesting for the data, in that the MAP results recommend extraction
of three factors, rather than the single strong factor or the theoretically
expected five factors. As you can see in the MAP plot, there might
be a significant inflection point at the first factor. The true minimum
is clearly at three factors, but the change between factors 2, 3,
and 4 is so minimal as to be almost inconsequential. The third factor
is only 0.0003 less than the APC for factor 2, and only 0.0008 less
than factor 4. One could argue that the only real inflection point
is at 1.
This third example reinforces
the fact that EFA is both an art and a quantitative science, and that
the researcher’s good judgment is critical when the data is
not as clear or cooperative as one would hope. This data is not giving
us a clear indication of how many factors to extract, and thus we
need to explore several options
for what is most sensible. When data is uncooperative in this way,
replication becomes even more critical, as we will discuss in chapters
to come.