Application: Replication of Marsh SDQ Data

This example demonstrates a replication analysis for EFA using Osborne & Fitzpatrick’s (2012) procedure. We use internal replicability analysis with the SDQ data, randomly subsampling two independent samples from the original sample that are then analyzed separately using specific extraction and rotation guidelines based on our previous analyses of the scale. We are using the SDQ data because it is sufficiently large enough (N=15,661) to permit us to draw two moderately sized subsamples from the data. Unfortunately, the engineering and GDS data sets are not large enough to allow us to subsample from the data. If we were to do so, this would result in sample sizes of less than 240 and subject to item ratios of less than 13:1. If we remember the previous chapter, these are not ideal conditions for an EFA!
Before we get started, let’s review the syntax to subsample from a data set. We use the SURVEYSELECT procedure, as we did for an example in the previous chapter. The code to produce our two subsamples is presented and described below.
*Sample 1;
proc surveyselect data = sdqdata  method = SRS  n = 500  
      out = sdqdata_ss1   seed = 37;
run;
*Sample 2;
proc surveyselect data = sdqdata  method = SRS  n = 500 
      out = sdqdata_ss2  seed = 62;
run; 
In the code above, the DATA argument specifies the input data set to sample records from; the METHOD argument specifies the method of sampling (note that we use simple random sampling without replacement above); the N argument specifies the number of records to include in our subsample; the OUT argument specifies the name of the data set to contain the subsample; and the SEED option sets the seed to our random number so that we can rerun this code and get the same subsample.
After we draw out two subsamples from the original data, we conduct an EFA on each data set. We take care to apply the same methods (e.g., extraction, rotation) to each data set so that differences in the results are attributable to the data and not the methods. Consistent with our findings in previous chapters, we use maximum likelihood extraction and direct oblimin rotation. We also report a three-factor solution (the factor structure suggested by previous research on the scale) as well as two- and four-factor solutions to demonstrate how misspecification of a factor model can quickly become evident through replication analysis. An example of the basic syntax for the three-factor solution run on the first sample is presented below.
proc factor data = sdqdata_ss1  nfactors = 3  method = ml  
      rotate = OBLIMIN  fuzz=.3;
   VAR Math: Par: Eng:;
run;
Three-factor replication analysis. An overview of the replication is presented in Three-factor SDQ replicability analysis, ML extraction, oblimin rotation. As you can see in this table, the two samples have identical factor structures. Even though the items that load on factor 2 in sample 1 are found to load on factor 3 in sample 2, these are still identical structural solutions because the same sets of items are loading on each factor. Thus, these two samples pass the first test of replicability— consistent factor structure. In addition, notice that the squared differences in factor loadings are all relatively small. The largest difference is .005,suggesting the factor loadings do not differ by more than |.07|—which is not bad. We would suggest that after the squared differences achieve a magnitude of .04, indicating a difference of |.20|, a researcher can begin to consider factor loadings to be volatile. However, based on our current results, we can conclude that the two samples pass the second test as well—consistent magnitude of factor loadings.
Table 6.1 Three-factor SDQ replicability analysis, ML extraction, oblimin rotation
Var:
Sample 1
Sample 2
Squared Differences
Comm
Factor Loadings
- - (1) - - (2) - - (3) - -
Comm
Factor Loadings
- - (1) - - (2) - - (3) - -
Eng1
.61
.77
.69
.83
.0038
Eng2
.65
.82
.67
.84
.0006
Eng3
.75
.86
.68
.82
.0018
Eng4
.48
-.68
.49
-.66
.0006
Math1
.78
.90
.75
.87
.0010
Math2
.75
.87
.81
.90
.0007
Math3
.77
.87
.74
.86
.0002
Math4
.47
-.65
.38
-.61
.0019
Par1
.54
.73
.50
.70
.0008
Par2
.42
-.67
.49
-.73
.0033
Par3
.63
.79
.73
.85
.0041
Par4
.38
-.53
.36
-.55
.0002
Par5
.59
.76
.49
.69
.0045
Eigen:
3.58
2.12
2.11
3.49
2.45
1.85
Note: Loadings less than 0.30 were suppressed to highlight pattern. Pattern coefficients were reported.
Although examination of communalities and eigenvalues is not part of our replication procedure, they can give us additional information about our samples and solutions. In the above table, the final extracted communalities for the items in each sample are very similar. If we were to take the difference between the two communalities, they would range between .01 and .10. This indicates approximately the same amount of unique variance is being extracted from each item in the two samples—a good sign for our replication analysis!
The final eigenvalues differ a little more because they are the sum of the partitioned item level variance. Remember, the unique variance that is associated with each item is then further divided among the factors. All of the tiny differences in the extracted variance are compounded in the eigenvalues, resulting in larger differences. Furthermore, the order in which the factors are extracted is determined by their eigenvalues. Thus, in sample 1 the factor that the parenting items load on is extracted third because it has the smallest (just barely) eigenvalue. However, in sample 2, the factor that these items load on is extracted second because it has a slightly larger eigenvalue. As we mentioned above, these differences in extraction order are not important when examining structural replication, but they do tell us that the relative weight of the factors has shifted slightly. In this example, the parenting items explain more variance in sample 2.
Finally, for fun, we made SAS do all the work in the analysis for us. We output our pattern matrix to a SAS data set and then merged and compared the two results. The syntax to do this, along with line comments for explanation, is presented below. Note, the syntax to create the two subsamples is provided above and is not included here.
**Conduct EFA Analyses; 
*ODS output system used to output pattern matrix;
ods output ObliqueRotFactPat=SS1_pattern1; 
proc factor data = sdqdata_ss1  nfactors = 3  method = ml  
      rotate = OBLIMIN  fuzz=.3;
   VAR Math: Par: Eng:;
run;
ods output ObliqueRotFactPat=SS2_pattern1;
proc factor  data = sdqdata_ss2  nfactors = 3  method = ml 
      rotate = OBLIMIN  fuzz=.3;
   VAR Math: Par: Eng:;
run;
ods output close;

**Rename output variables so they have unique names and can be merged together. 
     Note the variables with an Fz prefix in the output data set are the 
     results with the Fuzz option employed (suppressing item loadings < .3);
data SS1_pattern2 (keep=Variable Fz:  rename=(FzFactor1=SS1_Fact1 
      FzFactor2=SS1_Fact2  FzFactor3=SS1_Fact3));
   set SS1_pattern1;
run;
data SS2_pattern2 (keep=Variable itemN Fz:  rename=(FzFactor1=SS2_Fact1 
      FzFactor2=SS2_Fact2  FzFactor3=SS2_Fact3));
   set SS2_pattern1;
run;

*Sort data sets by the variable they will be merged by;
proc sort data=SS1_pattern2; by Variable; run;
proc sort data=SS2_pattern2; by Variable; run;

*Merge and calculate squared differences;
data compare_pattern;
   merge SS1_pattern2 SS2_pattern2;
   by Variable;
*create new variables containing absolute values of the 
 factor loadings for use in identifying largest loading;
   abs_SS1_Fact1=abs(SS1_Fact1);
   abs_SS1_Fact2=abs(SS1_Fact2);
   abs_SS1_Fact3=abs(SS1_Fact3);
   abs_SS2_Fact1=abs(SS2_Fact1);
   abs_SS2_Fact2=abs(SS2_Fact2);
   abs_SS2_Fact3=abs(SS2_Fact3);
   *conditional estimation of squared differences;
   *remember factor 2 in sample 1 = factor 3 in sample 2
    and factor 3 in sample 1 = factor 2 in sample 2; 
   if max(of abs_SS1_Fact:)=abs_SS1_Fact1 and 
      max(of abs_SS2_Fact:)=abs_SS2_Fact1 then
      squared_diff=(SS1_Fact1-SS2_Fact1)**2;
   else if max(of abs_SS1_Fact:)=abs_SS1_Fact2 and 
      max(of abs_SS2_Fact:)=abs_SS2_Fact3 then 
      squared_diff=(SS1_Fact2-SS2_Fact3)**2;
   else if max(of abs_SS1_Fact:)=abs_SS1_Fact3 and 
      max(of abs_SS2_Fact:)=abs_SS2_Fact2 then 
      squared_diff=(SS1_Fact3-SS2_Fact2)**2;
run;
Two-factor replication analysis. As mentioned above, this should replicate poorly, as a two-factor solution is not a strong solution for this scale. As you can see in Two-factor SDQ replicability analysis, ML extraction, oblimin rotation, problems are immediately obvious. Unlike the results for the three-factor solution, many of the maximum loadings for an item were relatively low (less than .3). Thus, we present all item loadings with the maximums in bold to summarize these results. Again, we see a factor loading switch between the two samples—the majority of the items that load on factor 1 in sample 1 also load on factor 2 in sample 2. Although this in itself is not a problem, we also notice the eigenvalues for the respective factors differ dramatically (i.e., 3.42 vs 2.43 and 2.10 vs 3.22) and the extracted communalities are quite low for the parenting items. All of these together indicate there could be a problem with the replication.
These issues are further evidenced in our review of the factor structure. We find four of the thirteen items fail to replicate the basic structure. In other words, these items loaded on non-congruent factors. Among the nine remaining items with replicated structure, the squared differences in item loadings were within reasonable range (0.0004 to 0.0053). Overall, however, the lack of structural replication for nearly a third of the items indicates this solution does not replicate well.
Table 6.2 Two-factor SDQ replicability analysis, ML extraction, oblimin rotation
Var:
Sample 1
Sample 2
Squared Differences
Comm
Factor Loadings
- - (1) - - - (2) - -
Comm
Factor Loadings
- - (1) - - - (2) - -
Eng1
.62
-.03
.80
.68
.85
-.14
.0033
Eng2
.62
-.07
.81
.63
.83
-.19
.0002
Eng3
.73
-.01
.86
.67
.85
-.11
.0002
Eng4
.48
.09
-.72
.50
-.73
.13
.0001
Math1
.76
.92
-.17
.75
-.04
.88
.0018
Math2
.75
.89
-.08
.79
.08
.87
.0007
Math3
.78
.90
-.05
.75
.00
.86
.0012
Math4
.46
-.70
.05
.38
.04
-.63
.0053
Par1
.05
.16
.11
.10
.26
.10
failed
Par2
.01
-.07
-.06
.05
-.18
-.08
failed
Par3
.06
.19
.10
.13
.27
.16
failed
Par4
.13
-.14
-.28
.13
-.30
-.13
.0004
Par5
.06
.17
.14
.10
.23
.15
failed
Eigen:
3.42
2.10
3.22
2.43
Note: Maximum loadings by item and sample are highlighted. Pattern coefficients were reported.
Four-factor replication analysis. An overview of this analysis is presented in Four-factor SDQ replicability analysis, ML extraction, oblimin rotation. The communalities for both samples were adequately large, ranging from .40 to .87 (note that they are excluded from the table to conserve space). The eigenvalues for the fourth factor in both samples were below 1, indicating that this factor would not meet the Kaiser Criterion for inclusion. In addition, only one item (Eng4) in sample 2 was found to load on this factor. Based on these results, we would conclude the fourth factor does not appear to sufficiently capture a comprehensive construct. Thus, we would likely drop the fourth factor and explore other solutions before conducting a replication analysis.
However, for the purpose of this example, let’s continue to examine the replicability of the four-factor solution. In reviewing the structure of the two solutions, we find that one of the thirteen items failed to load on the same factor. The squared differences in the remaining item ranged from .0001 to .0062, suggesting that the item loadings are fairly consistent in magnitude. Overall, these replication results are better than the two-factor solution. However, the underlying EFA results leave much to be desired.
Table 6.3 Four-factor SDQ replicability analysis, ML extraction, oblimin rotation
Var:
Sample 1 Factor Loadings
Sample 2 Factor Loadings
Squared Differences
1
2
3
4
1
2
3
4
Eng1
.77
.81
.0016
Eng2
.87
.86
.0001
Eng3
.82
.78
.0014
Eng4
-.57
.50
-.51
.66
failed
Math1
.89
.87
.0005
Math2
.86
.89
.0013
Math3
.87
.86
.0003
Math4
-.68
-.62
.0039
Par1
.76
.74
.0001
Par2
-.61
-.69
.0062
Par3
.82
.84
.0006
Par4
-.44
.38
-.51
.0043
Par5
.77
.71
.0035
Eigen:
3.66
2.12
2.14
.42
3.62
2.28
1.93
.54
Note: Loadings less than 0.30 were suppressed to highlight pattern. Pattern coefficients were reported.
Appropriately large samples make a difference. In Reduced sample three-factor SDQ replicability analysis, ML extraction, oblimin rotation, we replicate the three-factor analysis presented in Three-factor SDQ replicability analysis, ML extraction, oblimin rotation, but with two random samples of N=100 each, much smaller than the almost N=500 samples that were used previously. In this analysis, you can see that all of the items loaded on congruent factors, but two items had troublingly large differences in factor loadings. Eng1 and Eng3 had loading that differed by more than |.20| between the two samples. As you can see from the communality estimates, that led to a large decrease in the estimates for these items—and squared differences of over 0.04. These results are actually quite good, given the sample size. Often we might see a lot more havoc wreaked by small sample sizes.
As previous authors have noted, EFA is a large-sample procedure, and replications with relatively small samples can lead to more volatility than one would see with larger samples. With 500 in each sample, this scale looks relatively replicable, but with only 100 in each sample there are some questions about replicability.
Table 6.4 Reduced sample three-factor SDQ replicability analysis, ML extraction, oblimin rotation
Var:
Sample 1
Sample 2
Squared Differences
Comm
Factor Loadings
- - (1) - - (2) - - (3) - -
Comm
Factor Loadings
- - (1) - - (2) - - (3) - -
Eng1
.80
.89
.57
.67
.0479
Eng2
.77
.89
.57
.77
.0139
Eng3
.84
.92
.66
.71
.0417
Eng4
.41
-.56
.40
.38
-.43
.0181
Math1
.79
.91
.79
.90
.0001
Math2
.81
.90
.78
.88
.0005
Math3
.78
.87
.68
.79
.0053
Math4
.34
-.47
.41
-.62
.0228
Par1
.38
-.57
.46
-.56
.0000
Par2
.55
.76
.45
.70
.0039
Par3
.61
-.74
.49
-.67
.0062
Par4
.57
.77
.61
.74
.0009
Par5
.35
-.56
.43
-.64
.0069
Eigen:
3.12
2.69
2.19
3.71
2.34
1.25
Note: Loadings less than 0.30 were suppressed to highlight pattern. Pattern coefficients were reported.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset