Many early adopters
of the procedure saw bootstrap analysis as a panacea for small or
biased samples. They reasoned that, with enough resampled data sets,
the bias and small sample would be compensated for, and would provide
a better estimate of the population parameters than the original sample
by itself. These scholars wanted to replace an estimate produced by
the sample with the average bootstrapped statistic.
Unfortunately, bootstrapped
statistics are not immune to all bias. Osborne’s (2015) experiments
with logistic regression suggest that results from small or biased
samples tend not to be self-correcting and instead lead to promulgating
bias. In other words, the averaged bootstrapped statistic from a biased
sample can be just as biased as the original sample estimate. Large
biased samples are probably in the same category. You can endlessly
resample the same small or biased sample, but there is limited information
in the sample. One cannot build something out of nothing.
Some research suggests
that some level of bias can be moderately accounted for through specific
methods of CI estimation (e.g., studentized interval or bias-corrected
and accelerated interval methods; Davison & Hinkley, 1997, p.
231; Efron & Tibshirani, 1994, p. 184). However, these methods
were designed for a corrected estimate of the CIs, not a corrected
estimate of the average bootstrapped statistic. Although some of the
methods could be extended to produce such an estimate, we do not believe
this is a worthwhile endeavor. These methods might be useful in estimating
more accurate CI for biased or small samples, but we do not believe
they are robust enough to provide a reliable estimate of the population
parameter.
Bootstrap analyses can
provide estimates of replicability or generalizability and help identify
inappropriately influential data points in a sample. Although resampling
might not be able to improve upon a biased estimate, it can provide
CIs through which we can evaluate just how imprecise the parameter
estimates are. These CI can help researchers interpret their results
and determine how they might generalize. In addition, bootstrap methods
provide a distribution of parameter estimates from the resamples.
This distribution can be used to help identify inappropriately influential
data points. If one does thousands of resampling analyses, and they
are distributed with a skew, the long tail is likely due to the influence
of a few cases. However, it is important to note that there can be
easier ways to detect inappropriately influential data points. Osborne
(2015) found that cleaning data prior to bootstrap
analysis often yielded much better results. Thus,
if you have a sample, and if you are intending to bootstrap, it is
best to do some preliminary data cleaning first.
Overall, bootstrap resampling
can be a valuable tool in the statistician’s toolbox, but it
is not a panacea. It cannot fix a fatally flawed sample, and it cannot
compensate for an inappropriately small sample. But given a reasonable
sample, bootstrap resampling can do some interesting things. It can
provide confidence intervals for things like effect sizes that we
really cannot get any other way. It can provide information about
the precision of the results, and it can give some information in
a single sample that is helpful in determining whether a solution
will replicate or not. In other words, if one performs an appropriate
bootstrap analysis of a reasonable sample, and one sees relatively
narrow confidence intervals, one can say that the solution arrived
at is more precise than it would have been if one had very broad confidence
intervals. Further, if those confidence intervals are narrow and precise,
it is likely that a similar sample will produce similar results. If
the confidence intervals are wide and sloppy, it is not likely that
a similar sample would produce similar results.