How Do the Criteria Compare?

Now that we are familiar with the different extraction criteria and the SAS syntax and output for each criterion, let’s examine the potential differences between the criteria. We will review these criteria in each of our three data sets and see how the criteria agree or disagree.
Engineering data. We examined each of the criteria for this data in the section above. The theory, the Kaiser Criterion, the minimum eigenvalue, and the proportion of variance recommend a two-factor solution; the scree plot recommends a two-factor solution, but also identifies a possible three-factor solution; and both MAP and parallel analysis recommend a two-factor solution. In general, when we use a scale that generally has a strong two-factor structure, the methods tend to conclude that two factors is the optimal number to extract. Given that the results make sense in the context of the theoretical model, we would likely extract two factors for rotation and interpretation.
Before we examine these criteria in the next set of data, let us briefly review how a two-factor extraction of this data affects the subsequent factor loadings. We will examine the unrotated factor loading since we have not yet discussed rotation. The loadings are plotted in Figure 3.8 Factor loading plot, iterated PAF extraction (we requested this plot by adding the INITLOADINGS plot to the PLOTS option, plots = ( initloadings scree)). As you can see from the factor loading plot, the data is clustered into two clear and distinct factors. In our opinion, this is about as good as it gets in exploratory factor analysis at this stage of the analysis.
Figure 3.8 Factor loading plot, iterated PAF extraction
Self-description questionnaire data. Because the data from the SDQ seemed to be relatively well-behaved across the initial extraction comparisons, we will use ML extraction and explore whether our expected three-factor solution is tenable. Syntax to produce all of results to examine the extraction criteria for this data set is presented below.
Ods graphics on;
proc factor data = sdqdata method = ML plots = SCREE;
   var Eng: Math: Par:;
run;
ods graphics off;

*Assuming the include statement has already read in the MACROs;
%map(sdqdata);
%parallel(sdqdata,100,95,2,2,99);
The initial eigenvalues are presented in Figure 3.9 Eigenvalues for SDQ data, and the scree plot is presented in Figure 3.10 Scree plot for SDQ data. Theory, the Kaiser Criterion, the minimum eigenvalue, and the proportion of variance suggest the data has a three-factor solution. The scree plot, however, is a little less clear. Scree plots do not always have one clear elbow. In this case, it is possible to argue that any one of several points is the true “elbow” — including 2, 3, 4, or 5. In this example, the scree plot is not terribly informative.
The parallel analysis results are presented in Figure 3.11 Parallel analysis output of SDQ data and Figure 3.12 Parallel analysis plot of SDQ data and the MAP analysis results are presented in Figure 3.13 MAP results for SDQ data and Figure 3.14 Plot of average partial correlations for MAP test. Because the sample size was so large, parallel analysis might not be as useful. The largest randomly generated eigenvalue (95th percentile) was 0.057. Using the criteria for parallel analysis, one would recommend examining either three or four factors (depending on how “significantly” different the raw data eigenvalue should be). The data from the MAP analysis seems to further reinforce theory and other criteria, indicating that three factors is the right number to extract. As you can see in Figure 3.14 Plot of average partial correlations for MAP test, the minimum inflection point is at 3.
In summary, theory and the Kaiser Criterion recommend three factors, the scree plot recommends between two and five factors, parallel analysis recommends four, and MAP recommends three factors. As Velicer and others have argued, the MAP appears to be, at least in this case, more congruent with theory and eigenvalues. That is reassuring. The three-factor model seems to be the best recommendation as it makes for a strong, interpretable model.
Figure 3.9 Eigenvalues for SDQ data
Figure 3.10 Scree plot for SDQ data
Figure 3.11 Parallel analysis output of SDQ data
Figure 3.12 Parallel analysis plot of SDQ data
Figure 3.13 MAP results for SDQ data
Figure 3.14 Plot of average partial correlations for MAP test
Geriatric depression scale data. This scale provided a very different counterpoint to the clear conceptually consistent results of the engineering and SDQ data. This scale was designed to have five subscales originally,[4] so theory would suggest that there are five factors. But as with many of our endeavors in the social sciences, this might not hold true when put to the test. For example, it is just as likely that all items will load as a single factor, or that a different number of factors will be ideal. We used a ULS extraction for this data based on results from Chapter 2. The syntax to examine the extraction criteria for this data set is presented below. Please note that PROC FACTOR automatically deletes records with missing data from analysis, thereby reducing our sample of 656 down to 479. The MAP and parallel analysis macros cannot handle missing data, so we must delete these records beforehand. We will discuss the issue of missing data further in Chapter 8.
ods graphics on;
proc factor data = marshdata  method = ULS  plots = SCREE;
   var GDS: ;
run;
ods graphics off;

*Assuming the include statement has already read in the MACROS;
*First delete any cases with a missing value so this will run for the 
 purpose of our example;
*In practice, missing data need to be dealt with beforehand;
data marsh_nomiss(drop = i);
   set marshdata;
   array var(*) _NUMERIC_;
   do i=1 to dim(var);
      if var(i)=. then delete;
   end;
run;
%map(marsh_nomiss);
%parallel(marsh_nomiss,100,95,2,2,99);
The initial eigenvalues are presented in Figure 3.15 Eigenvalues for GDS data, and the scree plot is presented in Figure 3.16 Scree plot of GDS data. Theory and the proportion of variance criterion would suggest this data has a five-factor structure. The Kaiser Criterion identifies three factors with eigenvalues greater than 1.0, and the minimum eigenvalue criteria identify seven factors with eigenvalues above the average (0.36). And finally, the scree plot is not very clear. The scree plot seems to indicate that the first inflection point is at two factors, but it is also arguable that there is a second inflection point at the fourth factor. Thus, using these three traditional criteria, we would probably combine these results and test for a variety of configurations including one-, two-, three-, four-, and five-factor extraction. These results would be examined to see whether the original theoretical framework made sense, or if any of the other factor structures seem to make sense. However, since we have parallel analysis and MAP analysis, let us examine those results before exploring these options.
The parallel analysis results are presented in Figure 3.17 Parallel analysis output for GDS data and Figure 3.18 Parallel analysis plot of GDS data and the MAP analysis results are presented in Figure 3.19 MAP output for GDS data and Figure 3.20 Plot of average partial correlations for MAP test. The results of this parallel analysis pose an interesting dilemma, as the eigenvalues quickly drop below 1 in the raw data analysis, and quickly approach the random data eigenvalues. However, it is not until around the eighth factor that the lines meet. This would suggest that we should extract many more factors than probably makes sense. The MAP analysis was also interesting for the data, in that the MAP results recommend extraction of three factors, rather than the single strong factor or the theoretically expected five factors. As you can see in the MAP plot, there might be a significant inflection point at the first factor. The true minimum is clearly at three factors, but the change between factors 2, 3, and 4 is so minimal as to be almost inconsequential. The third factor is only 0.0003 less than the APC for factor 2, and only 0.0008 less than factor 4. One could argue that the only real inflection point is at 1.
Figure 3.15 Eigenvalues for GDS data
Figure 3.16 Scree plot of GDS data
Figure 3.17 Parallel analysis output for GDS data
Figure 3.18 Parallel analysis plot of GDS data
Figure 3.19 MAP output for GDS data
Figure 3.20 Plot of average partial correlations for MAP test
This third example reinforces the fact that EFA is both an art and a quantitative science, and that the researcher’s good judgment is critical when the data is not as clear or cooperative as one would hope. This data is not giving us a clear indication of how many factors to extract, and thus we need to explore several options for what is most sensible. When data is uncooperative in this way, replication becomes even more critical, as we will discuss in chapters to come.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset