Many authors have asserted that an alpha of 0.70 or 0.80
represents “adequate” and “good” reliability,
respectively (e.g., Nunnally & Bernstein, 1994). Remember, an
alpha of 0.70 is associated with 30% error variance in a scale, and
an alpha of 0.80 is associated with 20%. We do not believe that these
standards represent “good” measurement, particularly
when more than one variable in an analysis is measured with this quality.
Let us look at an example
from multiple regression to demonstrate our point. When each independent
variable is added to a regression equation, the effects of less than
perfect reliability on the strength of the relationship becomes more
complex, and the results of the analysis more questionable. One independent
variable with less than perfect reliability can lead to each subsequent
variable claiming part of the error variance left over by the unreliable
variable or variables. The apportionment of the explained variance
among the independent variables will thus be incorrect and reflect
a misestimation of the true population effect. In essence, low reliability
in one variable can lead to substantial over-estimation of the effect
of another related variable. As more independent variables with low
levels of reliability are added to the equation, the greater the likelihood
that the variance accounted for is not apportioned correctly. Ultimately,
some effects can end up masked (creating a Type II error), with other
effects inflated inappropriately in the same analysis, potentially
leading to Type I errors of inference (Osborne, 2013). Thus, one thesis
of this chapter is that better measurement is preferable to less good
measurement.
So what is “good”
enough? Unfortunately, we do not think there is an easy answer to
this question. Specific cutoff values indicating adequate or good
reliability, such as 0.70 or 0.80, are easy to use but what do they
really mean? What is the practical difference between an alpha of
0.79 and 0.80? Thus, while we believe we should aim for higher alphas,
we do not have specific advice for what alphas should be obtained.
We can only say that higher is better, and there are probably diminishing
returns after one exceeds 0.90 (which still represents about 10% error
variance in the measurement). In addition, what constitutes “good
enough” also depends on the purpose of the data, and the method
of analysis. Using the data to choose children for an educational
program or select employees for promotion is different from evaluating
correlations between constructs for a dissertation. Furthermore, modern
measurement (e.g., Rasch or IRT measurement) and modern analysis techniques
(e.g., structural equation modeling) can help researchers build stronger
and better scales. Thus, we strongly recommend that researchers aim
for the best possible measurement that is reasonably attainable. We
further recommend that researchers be transparent in reporting and
interpreting reliability estimates in the context of their research,
and not rely on specific cutoff values to tell them whether their
measurement is “good”.
In practice, some researchers
do strive for better estimates of reliability. A review of educational
psychology literature in 1969 and 1999 indicated average (reported)
alphas of 0.86 and 0.83, respectively (Osborne, 2008). These estimates
are pretty good, but they reflect only the 26% of articles (in this
study of modern, high-quality journals) that reported this basic data
quality information. So while a quarter of the researchers did report
alphas that tended to be high, three-quarters did not even acknowledge
the importance of such indicators. Unfortunately, lack of reporting
about reliability and use of low alphas to support reliability are
occurrences that are not difficult to find among peer-reviewed journal
articles, across disciplines. Poor measurement can have profound (and
often unpredictable) effects on outcomes, and thus more researchers
need to pay attention to this issue.