Example Syntax and Output

Traditional CI with ML

Before we review the lengthier bootstrapping syntax, we will quickly review a predefined SAS option for estimating the CI of factor loadings when conducting maximum likelihood (ML) extraction. ML extraction is the only extraction method that readily provides estimates of standard error that can then be used to compute traditional CI.[1] These CIs are different from bootstrapped estimates because they are based on standard errors as opposed resampling procedures. As we mentioned in the beginning of this chapter, we often do not have the ingredients to compute traditional CIs and thus bootstrapped CIs have become a useful alternative. Thus, we will discuss this method of computing CIs only briefly as it is rather limited in scope and application. Again, if you are interested in performing a non-ML extraction (see Chapter 2) or you are interested in reviewing CIs for the eigenvalues, communalities, etc., then this is not the method for you.
The traditional CI can be requested by adding the CI option to the FACTOR statement. If you would like SAS to evaluate whether the CIs are above a certain absolute magnitude, you can set the CI option equal to that respective magnitude (e.g., CI=.4). You can also change the default confidence level of 95% for the interval that is calculated using the ALPHA option (e.g., ALPHA=.9). An example of the syntax to estimate 95% CI using the engineering data and to request SAS to evaluate whether the loadings are above an absolute magnitude of 0.4 is provided below. Note that the ML extraction method is specified and must be specified in order for these options to work.
proc factor data = engdata  nfactors = 2  method = ML  rotate = OBLIMIN 
      CI= .4;
   var EngProb: INTERESTeng:;
run;
A sample of the pattern matrix results that are produced by the syntax is presented in Figure 7.1 Sample factor matrix produced by the CI option. Notice we now have five rows in our matrix for each variable in our EFA. The first row for each variable represents the factor loading, the second represents the standard error, the third and fourth represent the upper and lower bound of the CI, and the last provides a CI coverage indicator that can be used to help interpret the results. The coverage indicator displays the relative location of the CI (represented as “[]”), zero (represented as “0”), and the loading magnitude requested for evaluation (represented as “*”). The coverage indicator can be used to quickly determine which CIs contain values above a certain absolute magnitude or zero or both. However, as we will discuss further below, we believe the emphasis should be on the relative magnitude of the loadings captured in the CI as we find loadings of 0.1 and 0.2 to be equally as unacceptable as loadings of 0.
If we examine the loading of EngProbSolv1 on Factor1, we can see it has a loading of 0.85, a standard error of 0.02, a CI of [0.81, 0.88], and a CI that is above both zero and our specified magnitude of 0.4. If we examine this item’s coverage indicator on factor 2, we can see that the CI contains zero and that it is below our absolute magnitude of 0.4.
Figure 7.1 Sample factor matrix produced by the CI option

Bootstrapped CI

The syntax required to produce bootstrapped CI is a bit more complex than the code reviewed above but it can be used to produce CI under most conditions. The entire process can generally be accomplished with three basic SAS procedures (plus some specialty output and sorting). Each of these procedures corresponds with one of the three bootstrapping steps that we reviewed above: 1) resampling, 2) replication, and 3) summarization.
First, the SURVEYSELECT procedure is used to conduct our resampling. You were introduced to this procedure in the last chapter to create subsamples for internal replication. When this procedure is used for resampling, a few additional arguments are needed. An example of syntax to produce 2000 resamples of the engineering data is presented below.
proc surveyselect data = engdata  method = URS  samprate = 1  outhits
      out = outboot_eng  seed = 1  	rep = 2000;
run;
As before, we use the DATA option to read in our data, the METHOD option to specify the type of sampling (in this case, unrestricted random sampling or random sampling with replacement), the OUT option to output our data, and the SEED option to specify a random seed so that the subsampling can be replicated. The SAMPRATE, OUTHITS, and REP options are new, though. Instead of specifying the number of observations in our subsample using the N option (as we did before), we use the SAMPRATE option to identify the proportion of observations to select in each resample. By specifying SAMPRATE = 1 we are saying we want each resample to contain the same number of observations as our original sample. We are also including the OUTHITS option to make sure a separate observation is included in the output data set when the same observation is selected more than once. Finally, we include the REP option to specify the number of resamples we would like to conduct. This procedure will output a single data set (e.g., outboot_eng) that will contain each of the resamples. The different resamples can be identified by a number in a variable entitled “replicate”. Figure 7.2 Excerpt of outboot_eng displays an excerpt of the output data set, outboot_eng.
Figure 7.2 Excerpt of outboot_eng
Next, we will use BY processing in PROC FACTOR to replicate the analysis in each of our resamples. We also use ODS OUTPUT to output a data set containing our results. The syntax for this step is provided below.
ods output Eigenvalues=boot_eigen;
proc factor data = outboot_eng  nfactors = 2  method = PRINIT  
      priors = SMC   rotate = OBLIMIN;
   by replicate;
   var EngProb: INTERESTeng:;
run;
ods output close;
You are likely familiar with the majority of the syntax above based on previous chapters. We have used ODS to output plots and data sets. And if you are not familiar with PROC FACTOR by now, then you should probably go back a few chapters. In the syntax above, we specify that we want the tables of eigenvalues to be output to a data set by requesting the ODS table entitled “Eigenvalues”. The data set of eigenvalues will be called boot_eigen. We could also output a data set of communalities, a rotated pattern matrix, and much more. Selected ODS table names for PROC FACTOR summarizes several ODS table names for particular results that you might want to bootstrap. Similar to the data produced by the SURVEYSELECT procedure, the ODS data set will contain the results for all of the bootstrapped resamples. Again, the variable labeled as replicate can be used to distinguish the results that are associated with one resample versus another.
Table 7.1 Selected ODS table names for PROC FACTOR
Table Name
Description
Additional Arguments Required
ConvergenceStatus
Convergence status of each solution
Must use METHOD=PRINIT, ALPHA, ML or ULS.
Corr
Variable correlation matrix
Correlations must be requested by including the CORR option in the FACTOR statement.
Eigenvalues
Preliminary eigenvalues and the eigenvalues of the reduced correlation matrix
FactorPattern
Unrotated factor pattern matrix
FactorStructure
Rotated factor structure matrix
An oblique rotation method must be specified.
FinalCommun
Final communality estimates
InterFactorCorr
Factor correlation matrix
An oblique rotation method must be specified.
ObliqueRotFactPat
Rotated factor pattern matrix
An oblique rotation method must be specified.
OrthRotFactPat
Rotated factor pattern matrix
An orthogonal rotation method must be specified.
ReferenceStructure
Rotated reference structure matrix
An oblique rotation method must be specified.
VarExplain
Variance explained
The PROC FACTOR syntax is similar to what we have used before, with oneexception: the BY statement. The BY statement conducts the analysis requested for each of the groups identified by the variable in the BY statement. In this case, our different resamples in our input data set (outboot_eng) are identified by the unique ID value in the replicate variable. Thus, the current syntax will conduct the analysis separately for each resample. If we do not include the BY statement, the FACTOR procedure will use all of the records in the data set and conduct one analysis.
Finally, we will use the SORT and UNIVARIATE procedures to estimate our 95% CI from the bootstrap distribution. We must first sort our data because we are estimating CIs for multiple estimates—the eigenvalues—and thus we will need to use BY processing again to produce separate estimates for each eigenvalue. BY processing requires that the analysis variable, in this case our eigenvalue number (i.e., “Number”), is sorted in ascending order. We did not need to sort the data when we used BY processing in the FACTOR procedure above because the BY variable, “replicate”, was already in ascending order (it was output this way by the SURVEYSELECT procedure). We also use the NODUPKEY option in PROC SORT to remove the second set of eigenvalues, those for the reduced correlation matrix, from our data set. Since the preliminary eigenvalues are ordered first in our data set, this option will keep them and delete the other eigenvalues that appear as the second occurrence of a variable Number by Replicate ID combination. Please note, the NODUPKEY option is required only when bootstrapping CI for the eigenvalues.[2]
The UNIVARIATE procedure can then be used to estimate the 95% CI. Two statements are required by this procedure for the CI to be estimated: the VAR and OUTPUT statements. The VAR statement lists the variables that we would like to estimate CI for and the OUTPUT statement saves a data set with our requested estimates in it. Within the OUTPUT statement, the OUT and PCTLPTS options are necessary to name the output data set and to specify the percentile estimates we would like to output. For a 95% CI, you would request the 2.5 and 97.5 percentiles. Finally, the PCTLPRE option in the OUTP UT statement can be used to add a prefix to the CI variables that will be output to our data set. An example of this syntax is provided below, and an example of the final data set output by the procedure is presented in Figure 7.3 Final data set output from PROC UNIVARIATE.
proc sort data = boot_eigen nodupkey;
   by Number Replicate; 
run;
proc univariate data = boot_eigen;
   by Number;
   var Eigenvalue;
   output out = final pctlpts = 2.5, 97.5 pctlpre = ci;
run;
Figure 7.3 Final data set output from PROC UNIVARIATE

An Aligning Procedure for Factor Loadings

An additional step is required when bootstrapping CI for factor loadings. As you might have noticed in your own analyses, the order in which factors are extracted can be arbitrary, and the direction of the loadings on a factor can change. This is a problematic phenomenon when considering estimation of bootstrapped CI around factor loadings because the results from one resample might be systematically different from another. For example, if we run the syntax presented in the previous section and examine the bootstrapped results from the engineering data, we might find some instances where interest was extracted as the first factor and others where the problem solving factor was extracted first. In addition, in one data set we might find an item that has a loading of 0.88 on the interest factor, and in another it might have a loading of -0.88. These two issues have been referred to as the “alignment problem” in the literature (Clarkson, 1979; Pennell, 1972). They must be addressed before the CIs are estimated.
An alignment procedure has been proposed to address these issues (Clarkson, 1979; Ichikawa & Konishi, 1995). This procedure essentially compares each possible factor order solution within a given resample to the original solution and calculates the sum of squared deviations to identify the solution with the best fit. For example, if three factors are being extracted then we would compare the factors in their initial order, 1-2-3, to the original solution that was derived from the entire sample. But then we would also reorder and compare the bootstrapped solution using the following order schemes: 1-3-2, 2-1-3, 2-3-1, 3-1-2, and 3-2-1. We must compare all possible orderings of the factors. Thus, there would be k! possible combinations that must be evaluated, where k is the number of factors. We would identify the order that has the smallest sum of squared deviations as having the best fit and being the order that best aligns with the original solution. We would then re-order the bootstrapped results to match the identified order. Finally, the direction of the loading would be evaluated by comparing the direction of the loadings in the original solution to the re-ordered bootstrap solution. If the majority of the loadings had a different direction, then the resampled loadings would be reflected about the axis by multiplying the loadings by -1.
The process outlined above occurs for each of the resampled results. It is sometimes referred to as the column reflection and interchange method. We have put together a macro using the IML procedure that will do this step. The macro is based on R syntax provided by Zhang, Preacher, & Luo (2010), and is presented below. Now this syntax is likely a little more advanced than what you might have seen previously. Bear with us; it gets the job done and can be easily tweaked for other analyses. We use line comments to give a broad overview of the syntax.
%MACRO alignFactors(origLoadings,bootLoadings,alignedLoadings,summary);
*Get number of bootstrap replications from the data;
proc sql;
   select max(Replicate) into :nBoot from &bootLoadings;
quit;run;

proc iml;
   *Read in original loadings and get dimensions;
   use &origLoadings (keep=Factor:);
   read all var _ALL_ into order_matrix;
   p = nrow(order_matrix);
   m = ncol(order_matrix);

   *Get all possible permutations of the factor order;
   permutation = allperm(1:m); 

   *Load bootstrapped loading and iterate through steps for each sample;
   use &bootLoadings;
   index Replicate;
   do rep=1 to &nBoot;
      use &bootLoadings;
      read all var _NUM_ where (Replicate=rep) into 
         sub_boot_ldgs[colname=varNames];
      loading_matrix = sub_boot_ldgs[,loc(varNames^="Replicate")];

      **Step 1;
      *Obtain sum of squared deviations of columns of order_matrix 
       and loading_matrix;
      squared_deviation = i(m);
      do i=1 to m;
         temp1=t(order_matrix[,i]) * order_matrix[,i];
         do j=1 to m;
            temp2 = t(loading_matrix[,j]) * loading_matrix[,j];
            temp3 = t(order_matrix[,i]) * loading_matrix[,j];
            squared_deviation[i,j] = temp1 + temp2 - 2 * abs(temp3);
         end;
      end;

      **Step 2;
      *Find the best match between the order_matrix and loading_matrix;
      factorial_m = fact(m);
      sqd_dev_permutation = repeat(0,1,factorial_m);
      do i=1 to factorial_m;
         do j=1 to m;
            sqd_dev_permutation[i] = sqd_dev_permutation[i] + 
               squared_deviation[j,permutation[i,j]];
         end;
      end;
      best_match = loc(rank(sqd_dev_permutation) = 1);

      **Step 3;
      *Interchange columns of the loading_matrix to match the 
       target factor order_matrix;
      temp_ldg_matrix = loading_matrix;
      do i=1 to m;
         temp_ldg_matrix[,i]=loading_matrix[,permutation[best_match,i]];
      end;

      **Step 4;
      *Reflect columns of loadings if needed and create summary of 
       revisions;
      sub_revisions = permutation[best_match,];
      do j=1 to m;
         temp1 = t(order_matrix[,j]) * temp_ldg_matrix[,j];
         if temp1<0 then do;
            temp_ldg_matrix[,j] = temp_ldg_matrix[,j] * -1;
            sub_revisions[,j] = sub_revisions[,j]*-1;
         end;
      end;

      **Step 5;
      *Append each iteration to a SAS data set;
      sub_aligned_ldgs = repeat(rep,p)||temp_ldg_matrix;
      sub_rev_summary = 
         rep||sqd_dev_permutation[,best_match]||sub_revisions;
      if rep=1 then do;
         create &alignedLoadings from sub_aligned_ldgs[colname=varNames];
         append from sub_aligned_ldgs;
         close &alignedLoadings; 
         sum_factor_names = "position_factor1":cats("position_factor",m);
         sum_varNames = {"Replicate" "squared_diff"}||sum_factor_names;
         create &summary from sub_rev_summary[colname=sum_varNames];
         append from sub_rev_summary;
         close &summary;
      end;
      else do;
         edit &alignedLoadings;
         append from sub_aligned_ldgs;
         close &alignedLoadings; 
         edit &summary;
         append from sub_rev_summary;
         close &summary;
      end;

   end;

run;
%MEND;
In order to run the syntax, we just need to load the macro into SAS memory by running the syntax above. Then we run a line that calls the macro and enters the appropriate arguments into it. The macro call that is used for the engineering factor loadings is provided below. The first argument in the parentheses specifies the data set containing the factor loadings from our original run with the full sample (orig_loadings). The second specifies the data set containing the factor loadings for each of the resamples (boot_loadings). The third identifies the name of the data set that will be output from the macro that will contain the aligned loadings that can then be entered into PROC UNIVARIATE (aligned_loadings). Finally, the fourth identifies the name of a data set that will contain a summary of the alignment procedure. This argument also contains the final ordering, any reflection that was done, and the sum of squared differences for the order that best matched the original factor solution (rev_summary). This macro will perform this step for any EFA factor loadings, regardless of the number of factors or items.
%alignFactors(orig_loadings,boot_loadings,aligned_loadings,rev_summary);
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset