Theory. The first
criterion summarized above is unfortunately not something that SAS
can help you with. This requires combing the literature to better
understand the theoretical constructs that might underlie your set
of items. The literature might identify one specific structure (e.g.,
a two-factor model) or multiple structures (e.g., a two-factor model
under one framework and a five-factor model under another). It is
your job to understand these models and then test them with your data.
If we think back to
the engineering data that we used in Chapter 2, the scale used was
designed to evaluate two factors: engineering problem-solving and
interest in engineering. We could do some additional research to see
whether anyone has ever used this set of items or a similar set to
evaluate a different set of constructs. If we did not find anything
else in the literature, theory would tell us this data contains two
factors.
Kaiser Criterion, scree plot,
minimum eigenvalue, and proportion of variance. The next
four criteria are relatively easy to use and evaluate in SAS. As we
reviewed in Chapter 2, the various eigenvalue estimates are automatically
produced by the FACTOR
procedure. However,
the scree plot is not automatically output. We can request it be produced
by adding the SCREE
option to the FACTOR
statement
or by requesting it through the ODS graphics system. In general, the
Output Delivery System (ODS) offers prettier graphics, so that is
our preferred choice, but we will show you both below.
Using the engineering data, we can examine these three
criteria with the syntax provided below. The first set of syntax shows
how to use the SCREE
option to produce the
scree plot, and the second shows how to use ODS to produce the scree
plot. We use the iterated PAF extraction method (method=PRINIT
priors=SMC
) for this data because it seemed to be one
of the many appropriate methods based on the results in Chapter 2.
We also are not including the number of factors to extract (NFACTORS
=
) option for two reasons. First,
if we remember the example output for this data set provided in Chapter
2, SAS will produce the initial eigenvalues as well as the final extracted
eigenvalues, so we will get the important output no matter what. And
second, the default method in SAS is to use either the proportion
of variance or minimum eigenvalue criteria, so we will be able to
confirm our interpretation of one of the criteria with the SAS interpretation.
It is always better to have SAS check us if possible!
*Using the scree option;
proc factor data = engdata method = PRINIT priors = SMC SCREE;
var EngProbSolv: INTERESTeng: ;
run;
*Using the ODS system;
ods graphics on;
proc factor data = engdata method = PRINIT priors = SMC plots = SCREE;
var EngProbSolv: INTERESTeng: ;
run;
ods graphics off;
Both of the
PROC
FACTOR
statements above will produce the
following table related to the Kaiser Criterion, minimum eigenvalue,
and proportion of variance.
Figure 3.1 Initial eigenvalue estimates displays
the initial eigenvalue estimates produced before the extraction method
is used to iteratively converge on a solution. We use the initial
estimates because these are generally identical across extraction
method. In
this example, the Kaiser Criterion would tell us to extract two factors
because only the first two rows (which represent potential factors
to extract) have eigenvalues that are greater than 1. Based on the
minimum eigenvalue criteria, we would retain only factors that account
for .75 of an eigenvalue or more (i.e., the average eigenvalue or
the total extracted variance divided by the number of items: 10.6068/14).
This would also have us retain two factors. Finally, the proportion
of variance criteria would recommend we retain two factors because
100% of the common variance is explained by two factors. The text
at the bottom of the figure tells us that SAS plans to retain two
factors based on the proportion of variance criteria.
The scree plots produced by the
SCREE
option
and ODS are produced in
Figure 3.2 Scree plot from SCREE option and
Figure 3.3 Scree plot from ODS , respectively.
They show mostly identical results, the plotted initial eigenvalue
estimates, just in slightly different formats. The
SCREE
option
produces a simplified text-based figure (that can actually be copied
and pasted as text), but ODS produces a figure in a graphical format.
ODS also includes a plot that combines a graphical representation
of the eigenvalues with the proportion of variance explained.
Based on these scree
plots, we can see an “elbow” or inflection point at
factor 3, suggesting that two factors should be retained. There is
also an additional “elbow” that appears at factor 4.
This is much less pronounced than the previous inflection point but,
based on these results, we can consider exploring a three-factor solution
in addition to a two-factor solution.
MAP
and parallel analysis. The last two extraction criteria
are a little trickier to use. As we mentioned above, one barrier
to researchers using MAP and parallel analysis is that these procedures
are not widely implemented in most common statistical software, including
SAS. Fortunately, O’connor (2000) developed SAS syntax to perform
these analyses. These can currently be downloaded from https://people.ok.ubc.ca/brioconn/boconnor.html.
In addition, a macro version of this syntax is included in the example
code for this book and is available from the book website.
To run these analyses from the macros
that are available from the book website, you need to include the
macro syntax in your current SAS file using a %INCLUDE
statement.
This loads the macro into your current session memory so that you
can call macros from inside the external file. You then can run the
analyses by calling the respective macro and entering the necessary
arguments. The MAP macro has only one argument: datafile
,
the data set name. Thus, you can run the MAP macro from %
map(
datafile
)
. The parallel macro has more
arguments: datafile
, the data set name; ndatsets
,
the number of random data sets to use; percent
, the percentile
to use in determining whether the eigenvalues are significantly above
the mean; kind
, the type of parallel analysis
(1=PCA and 2=factor analysis); randtype
,
the type of random data to be used (1=from normally distributed random
data and 2=random permutations of the raw data); and seed
,
the seed value to use in computations. You can then run the parallel
macro from %
parallel(
datafile,ndatsets,percent,kind,randtype,seed
)
. The syntax to do this for the
engineering data is presented below.
*Include MAP and PARALLEL ANALYSIS macro syntax for use below;
filename parallel‘C:Location of Fileparallel_macro.sas’;
filename map ‘C:Location of Filemap_macro.sas';
%include parallel map;
*Run MAP and Parallel Analysis;
%map(engdata);
%parallel(engdata,100,95,2,2,99);
The results produced by the O’Connor
(2000) MAP analysis are presented in
Figure 3.4 MAP analysis results below. The
results show the eigenvalues extracted from the data, the average
squared partial correlations, and the average partial correlations
to the fourth power. Recall that, using MAP analysis, we want to choose
a number of factors where the average partial correlations hits a
minimum (the squared partial correlation based on the 1976 criteria
and the average partial correlation to the fourth power based on the
revised criteria). According to the results, two factors should be
retained.
The results for the
parallel analysis are presented in
Figure 3.6 Parallel analysis results below. The
parallel analysis computed the eigenvalues from the data and then
generated a series of comparable sets of random data. The mean eigenvalues
across the sets of random data were computed along with the 95th percentile
of those values. These values are presented on the right of the figure,
in the columns labeled Raw Data, Means, and Prcntyle. Remember, the
goal is to select the number of factors whose observed eigenvalues
exceed those produced from random data. The current results would
recommend two factors be extracted since the raw data eigenvalue goes
below the generated mean eigenvalue and 95th percentile eigenvalue
among the random data sets at factor 3. Again, we can visualize these
results by plotting the mean eigenvalues in the random data against
the eigenvalues in the raw data. (See
Figure 3.7 Parallel analysis plot of engineering data.)