Application: Nonrandom Missingness and Imputation

Let’s examine the effect of nonrandom missingness on our results and the potential improvement offered by the EM algorithm. Using the same SDQ sample of 300 students that we used earlier, we simulated some nonrandom, arbitrary missingness by recoding values of “6” to system missing values for the first English item (Eng1: I learn things quickly in English classes). This created a biased sample eliminating those students who answered the most optimistically about their learning in English (76 out of 300 cases).
We then used the MI procedure to estimate the EM covariance matrix. This covariance matrix is requested on the EM statement. We use all of the variables to produce the final estimates by specifying _ALL_ on the VAR statement. Although we have only the variables that we plan to analyze in this data set, it is often useful to include all variables, even those not in the analysis, during this step. This helps provide the most information possible when estimating missing values. PROC MI then outputs a single covariance matrix that can be read into the FACTOR procedure and used for subsequent analysis. This syntax is presented below.
*Impute missing via EM algorithm;
proc mi data=nonrandom_miss nimpute=0;
   em outem=em_covar_matrix;
   var _ALL_;
run;
*Run EFA on imputed covariance matrix;
proc factor data = em_covar_matrix  nobs=300  nfactors = 3  method = uls 
      rotate=oblimin;
   var Math: Par: Eng:;
run;
The MI procedure provides some useful tools to help understand patterns of missingness. Figure 8.1 PROC MI missing data summary displays the missing data summary output by the procedure. It shows the patterns of missingness that were identified and summarizes the mean value for each variable among the different groupings. We can see that our data includes only two missingness patterns: 1) cases with complete data and 2) cases missing data for Eng1. We can also compare the mean values of the variables along these groupings. In general the values seem close, but there are a few items where the groups have greater differences in average response (e.g., Eng2, Eng3).
Figure 8.1 PROC MI missing data summary
The eigenvalues for the original sample, the nonrandom missing sample, and the imputed sample are presented below in Effects of nonrandom missing data (N=76) on eigenvalues. The nonrandom missing sample is not extremely different from the original sample, but the first factor is slightly attenuated and a little less variance is extracted. Comparatively, the results from the imputed sample more closely mirror those of the original sample.
Table 8.3 Effects of nonrandom missing data (N=76) on eigenvalues
Factor:
Original
Nonrandom missing
Imputed
Initial
Final
Initial
Final
Initial
Final
1
3.683
3.720
3.434
3.463
3.672
3.706
2
2.471
2.531
2.485
2.532
2.455
2.513
3
1.532
1.589
1.558
1.610
1.520
1.577
4
.435
.507
.465
5
.094
.091
.094
6
.010
.041
.015
% Variance for first 3 factors
59.12%
60.31%
57.52%
58.50%
58.82%
59.97%
Note: ULS extraction was used.
As you can see in Effects of random or constant responding on factor loadings, the nonrandom missing sample continues to have slight deviations from the original sample. The math factor is extracted first in this data as opposed to second, indicating that it is explained better by the sample than the other factors. In addition, the loadings on the English factor have attenuated, and the other loadings have shifted slightly. The imputed loadings are not perfect, but they do seem to mirror the original sample more closely than the nonrandom missing sample.
Table 8.4 Effects of random or constant responding on factor loadings
Var:
Original
Nonrandom missing
Imputed
1
2
3
1
2
3
1
2
3
Par1
.678
-.027
.038
-.028
.686
-.014
.674
-.026
.039
Par2
-.723
-.028
.046
-.031
-.720
.051
-.730
-.026
.058
Par3
.957
-.020
-.080
-.023
.935
-.068
.954
-.020
.046
Par4
-.569
-.046
-.119
-.019
-.567
-.166
-.575
-.043
.061
Par5
.779
-.023
-.004
.000
.746
.015
.773
-.023
-.077
Math1
.005
.908
-.088
.929
-.022
-.076
.010
.908
-.107
Math2
.015
.850
.058
.892
-.008
.054
.007
.853
.011
Math3
-.012
.876
-.005
.860
-.032
-.013
-.010
.874
-.103
Math4
.011
-.665
-.020
-.643
-.035
-.023
.010
-.664
.081
Eng1
.014
.034
.748
.053
.019
.684
.008
.049
-.008
Eng2
-.069
-.044
.775
-.067
-.106
.745
-.071
-.048
-.019
Eng3
-.003
-.015
.811
-.018
.009
.788
-.004
-.020
.717
Eng4
-.087
-.029
-.626
-.038
-.110
-.578
-.101
-.023
.786
Note: ULS extraction with direct oblimin rotation was used. Primary factor each item loads on is highlighted.
The missing data did not have a large impact on the results in the current analysis, but it has the potential to cause a lot more problems. If the factor structure is less clear, there are fewer observations, or the missing data represents an unrepresented part of your sample, the results could be much different. Imputing missing data has the potential to ameliorate the harmful effects of nonrandom missingness and, as a bonus, keep all those hard-earned data points in your analysis.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset