The use and application
of bootstrap resampling gained the spotlight through a larger debate
about null statistical hypothesis testing (NHST). At the end of the
20th century, the role of NHST in evaluating research was under dispute.
Some thought the traditional criteria for determining “statistical
significance”, a p-value
< .05, was sufficient, but others critiqued the practice and recommended
additional or alternative measures (for an excellent overview of the
issues, see Fidler & Cumming, 2008; Killeen, 2008; Schmidt, 1996).
Debates on the topic ensued at annual meetings of professional organizations
(e.g., the American Educational Research Association; Thompson, 2002)
and in journals. (See also Cohen, 1994; Hunter, 1997.) Finally, in
1996 the American Psychological Association (APA) convened the Task
Force on Statistical Inference to discuss the issue. The Task Force
deliberated on the topic for two years and finally concluded with
a report documenting best practices for research. Although they did
not recommend a ban of NHST, they discouraged the historical over-reliance
on NHST and recommend the reporting of effect sizes and CI as a context
for statistical significance (Wilkinson, 1999).
The Task Force’s
recommendation, along with timely advances in computing, launched
bootstrap resampling out of relative obscurity. Previously, the formulas
to estimate CI were sparse. They were limited by underlying assumptions
and required advanced statistical knowledge to implement. Bootstrap
resampling did not suffer from such limitations and instead offered
a single procedure to estimate any CI. A decade before the Task Force’s
recommendation, bootstrap resampling was not feasible for everyday
practice because of the computationally intensive nature of the process.
It would have required days of calculations and many runs to a mainframe
computer (if you were lucky enough to have access to one!). However,
as computers became faster, smaller, cheaper, and smarter, these methods
became increasingly accessible and easier to implement.
Unfortunately, it is
still not routine for researchers to report confidence intervals for
EFA results and, indeed, it is not routine for replication to be considered
much at all. However, we hope the appeal of these techniques is apparent.
If we perform an EFA and then calculate 95% confidence intervals for
the relevant statistics, it helps a reader understand how precise
our estimates might be. Very broad CIs might signal to the reader
that the EFA is not very precise and, therefore, not terribly informative.
Narrow CIs, on the other hand, might signal to the reader that the
analysis is worth considering seriously. It does not necessarily mean
that the EFA reflects the population parameters exactly, but it is
more likely to be of use than one with low precision (broad CIs).
This type of analysis, while not routine, is simple through bootstrap
resampling methodologies (DiCiccio & Efron, 1996; Efron &
Tibshirani, 1994).