CI
option to the FACTOR
statement.
If you would like SAS to evaluate whether the CIs are above a certain
absolute magnitude, you can set the CI
option
equal to that respective magnitude (e.g., CI=.4
).
You can also change the default confidence level of 95% for the interval
that is calculated using the ALPHA
option
(e.g., ALPHA=.9
). An example of the syntax
to estimate 95% CI using the engineering data and to request SAS to
evaluate whether the loadings are above an absolute magnitude of 0.4
is provided below. Note that the ML extraction method is specified
and must be specified in order for these options to work.
proc factor data = engdata nfactors = 2 method = ML rotate = OBLIMIN CI= .4; var EngProb: INTERESTeng:; run;
SURVEYSELECT
procedure
is used to conduct our resampling. You were introduced to this procedure
in the last chapter to create subsamples for internal replication.
When this procedure is used for resampling, a few additional arguments
are needed. An example of syntax to produce 2000 resamples of the
engineering data is presented below.
proc surveyselect data = engdata method = URS samprate = 1 outhits out = outboot_eng seed = 1 rep = 2000; run;
DATA
option
to read in our data, the METHOD
option to
specify the type of sampling (in this case, unrestricted random sampling
or random sampling with replacement), the OUT
option
to output our data, and the SEED
option to
specify a random seed so that the subsampling can be replicated. The SAMPRATE
, OUTHITS
,
and REP
options are new, though. Instead
of specifying the number of observations in our subsample using the N
option
(as we did before), we use the SAMPRATE
option
to identify the proportion of observations to select in each resample.
By specifying SAMPRATE = 1
we are saying
we want each resample to contain the same number of observations as
our original sample. We are also including the OUTHITS
option
to make sure a separate observation is included in the output data
set when the same observation is selected more than once. Finally,
we include the REP
option to specify the
number of resamples we would like to conduct. This procedure will
output a single data set (e.g., outboot_eng) that will contain each
of the resamples. The different resamples can be identified by a number
in a variable entitled “replicate”. Figure 7.2 Excerpt of outboot_eng displays an excerpt of the output data set, outboot_eng.
BY
processing
in PROC FACTOR
to replicate the analysis
in each of our resamples. We also use ODS OUTPUT
to
output a data set containing our results. The syntax for this step
is provided below.
ods output Eigenvalues=boot_eigen; proc factor data = outboot_eng nfactors = 2 method = PRINIT priors = SMC rotate = OBLIMIN; by replicate; var EngProb: INTERESTeng:; run; ods output close;
PROC FACTOR
by now, then you
should probably go back a few chapters. In the syntax above, we specify
that we want the tables of eigenvalues to be output to a data set
by requesting the ODS table entitled “Eigenvalues”.
The data set of eigenvalues will be called boot_eigen. We could also
output a data set of communalities, a rotated pattern matrix, and
much more. Selected ODS table names for PROC FACTOR summarizes
several ODS table names for particular results that you might want
to bootstrap. Similar to the data produced by the SURVEYSELECT
procedure,
the ODS data set will contain the results for all of the bootstrapped
resamples. Again, the variable labeled as replicate can be used to
distinguish the results that are associated with one resample versus
another.
Table Name
|
Description
|
Additional Arguments
Required
|
---|---|---|
ConvergenceStatus
|
Convergence status of
each solution
|
Must use METHOD=PRINIT,
ALPHA, ML or ULS.
|
Corr
|
Variable correlation
matrix
|
Correlations must be
requested by including the CORR option in the FACTOR statement.
|
Eigenvalues
|
Preliminary eigenvalues
and the eigenvalues of the reduced correlation matrix
|
|
FactorPattern
|
Unrotated factor pattern
matrix
|
|
FactorStructure
|
Rotated factor structure
matrix
|
An oblique rotation
method must be specified.
|
FinalCommun
|
Final communality estimates
|
|
InterFactorCorr
|
Factor correlation matrix
|
An oblique rotation
method must be specified.
|
ObliqueRotFactPat
|
Rotated factor pattern
matrix
|
An oblique rotation
method must be specified.
|
OrthRotFactPat
|
Rotated factor pattern
matrix
|
An orthogonal rotation
method must be specified.
|
ReferenceStructure
|
Rotated reference structure
matrix
|
An oblique rotation
method must be specified.
|
VarExplain
|
Variance explained
|
PROC
FACTOR
syntax is similar to what we have used before,
with oneexception: the BY
statement. The BY
statement
conducts the analysis requested for each of the groups identified
by the variable in the BY
statement. In this
case, our different resamples in our input data set (outboot_eng)
are identified by the unique ID value in the replicate variable. Thus,
the current syntax will conduct the analysis separately for each resample.
If we do not include the BY
statement, the FACTOR
procedure
will use all of the records in the data set and conduct one analysis.
SORT
and UNIVARIATE
procedures
to estimate our 95% CI from the bootstrap distribution. We must first
sort our data because we are estimating CIs for multiple estimates—the
eigenvalues—and thus we will need to use BY
processing
again to produce separate estimates for each eigenvalue. BY
processing
requires that the analysis variable, in this case our eigenvalue number
(i.e., “Number”), is sorted in ascending order. We did
not need to sort the data when we used BY
processing
in the FACTOR
procedure above because the BY
variable,
“replicate”, was already in ascending order (it was
output this way by the SURVEYSELECT
procedure).
We also use the NODUPKEY
option in PROC
SORT
to remove the second set of eigenvalues, those for
the reduced correlation matrix, from our data set. Since the preliminary
eigenvalues are ordered first in our data set, this option will keep
them and delete the other eigenvalues that appear as the second occurrence
of a variable Number by Replicate ID combination. Please note, the NODUPKEY
option
is required only when bootstrapping CI for the eigenvalues.[2]
UNIVARIATE
procedure
can then be used to estimate the 95% CI. Two statements are required
by this procedure for the CI to be estimated: the VAR
and OUTPUT
statements.
The VAR
statement lists the variables that
we would like to estimate CI for and the OUTPUT
statement
saves a data set with our requested estimates in it. Within the OUTPUT
statement,
the OUT
and PCTLPTS
options
are necessary to name the output data set and to specify the percentile
estimates we would like to output. For a 95% CI, you would request
the 2.5 and 97.5 percentiles. Finally, the PCTLPRE
option
in the OUTP
UT
statement
can be used to add a prefix to the CI variables that will be output
to our data set. An example of this syntax is provided below, and
an example of the final data set output by the procedure is presented
in Figure 7.3 Final data set output from PROC UNIVARIATE.
proc sort data = boot_eigen nodupkey; by Number Replicate; run; proc univariate data = boot_eigen; by Number; var Eigenvalue; output out = final pctlpts = 2.5, 97.5 pctlpre = ci; run;
IML
procedure that will
do this step. The macro is based on R syntax provided by Zhang, Preacher,
& Luo (2010), and is presented below. Now this syntax is likely
a little more advanced than what you might have seen previously. Bear
with us; it gets the job done and can be easily tweaked for other
analyses. We use line comments to give a broad overview of the syntax.
%MACRO alignFactors(origLoadings,bootLoadings,alignedLoadings,summary); *Get number of bootstrap replications from the data; proc sql; select max(Replicate) into :nBoot from &bootLoadings; quit;run; proc iml; *Read in original loadings and get dimensions; use &origLoadings (keep=Factor:); read all var _ALL_ into order_matrix; p = nrow(order_matrix); m = ncol(order_matrix); *Get all possible permutations of the factor order; permutation = allperm(1:m); *Load bootstrapped loading and iterate through steps for each sample; use &bootLoadings; index Replicate; do rep=1 to &nBoot; use &bootLoadings; read all var _NUM_ where (Replicate=rep) into sub_boot_ldgs[colname=varNames]; loading_matrix = sub_boot_ldgs[,loc(varNames^="Replicate")]; **Step 1; *Obtain sum of squared deviations of columns of order_matrix and loading_matrix; squared_deviation = i(m); do i=1 to m; temp1=t(order_matrix[,i]) * order_matrix[,i]; do j=1 to m; temp2 = t(loading_matrix[,j]) * loading_matrix[,j]; temp3 = t(order_matrix[,i]) * loading_matrix[,j]; squared_deviation[i,j] = temp1 + temp2 - 2 * abs(temp3); end; end; **Step 2; *Find the best match between the order_matrix and loading_matrix; factorial_m = fact(m); sqd_dev_permutation = repeat(0,1,factorial_m); do i=1 to factorial_m; do j=1 to m; sqd_dev_permutation[i] = sqd_dev_permutation[i] + squared_deviation[j,permutation[i,j]]; end; end; best_match = loc(rank(sqd_dev_permutation) = 1); **Step 3; *Interchange columns of the loading_matrix to match the target factor order_matrix; temp_ldg_matrix = loading_matrix; do i=1 to m; temp_ldg_matrix[,i]=loading_matrix[,permutation[best_match,i]]; end; **Step 4; *Reflect columns of loadings if needed and create summary of revisions; sub_revisions = permutation[best_match,]; do j=1 to m; temp1 = t(order_matrix[,j]) * temp_ldg_matrix[,j]; if temp1<0 then do; temp_ldg_matrix[,j] = temp_ldg_matrix[,j] * -1; sub_revisions[,j] = sub_revisions[,j]*-1; end; end; **Step 5; *Append each iteration to a SAS data set; sub_aligned_ldgs = repeat(rep,p)||temp_ldg_matrix; sub_rev_summary = rep||sqd_dev_permutation[,best_match]||sub_revisions; if rep=1 then do; create &alignedLoadings from sub_aligned_ldgs[colname=varNames]; append from sub_aligned_ldgs; close &alignedLoadings; sum_factor_names = "position_factor1":cats("position_factor",m); sum_varNames = {"Replicate" "squared_diff"}||sum_factor_names; create &summary from sub_rev_summary[colname=sum_varNames]; append from sub_rev_summary; close &summary; end; else do; edit &alignedLoadings; append from sub_aligned_ldgs; close &alignedLoadings; edit &summary; append from sub_rev_summary; close &summary; end; end; run; %MEND;
PROC UNIVARIATE
(aligned_loadings).
Finally, the fourth identifies the name of a data set that will contain
a summary of the alignment procedure. This argument also contains
the final ordering, any reflection that was done, and the sum of squared
differences for the order that best matched the original factor solution
(rev_summary). This macro will perform this step for any EFA factor
loadings, regardless of the number of factors or items.
%alignFactors(orig_loadings,boot_loadings,aligned_loadings,rev_summary);