What Is Cronbach’s Alpha (And What Is It Not)?

Cronbach’s alpha (Cronbach, 1951) is one of the most widely reported indicators of scale reliability in the social sciences. It has some conveniences over other measures of reliability, and it has some drawbacks as well. There are also many misconceptions about the appropriate use of alpha. In this section we will review the strengths, weaknesses, uses, and misuses of alpha. However, let us start by reviewing the original goal for alpha.
Prior to Cronbach’s seminal work in this area, the reliability of a scale in a particular sample[1] was evaluated through methods such as test-retest correlations. This type of reliability is still discussed today in psychometrics textbooks, but it has serious drawbacks. These can include the difficulty of convening the same group of individuals to retake instruments, memory effects, and attenuation that are due to real change between administrations. Another serious drawback is particular to constructs (e.g., mood, content knowledge) that are expected to change over time. Thus, as Cronbach himself put it, test-retest reliability is generally best considered as an index of stability rather than reliability per se.
The split-half reliability estimate was also developed early in the 20th century. To perform this evaluation, items are divided into two groups (most commonly, even- and odd-numbered items) and scored. Those two scores are then compared as a proxy for an immediate test-retest correlation. This, too, has drawbacks—the number of items is halved, there is some doubt as to whether the two groups of items are parallel, and different splits of items can yield different coefficients. The Spearman-Brown correction was developed to help correct for the reduction in item number and to give a coefficient that is intended to be similar to the test-retest coefficient. As Cronbach (1951) pointed out, this coefficient is best characterized as an indicator of equivalence between two forms, much as today we also talk about parallel forms.
The Kuder Richardson Formula 20 (KR-20) and Cronbach’s alpha were developed to address some of the concerns over other forms of reliability, particularly split-half reliability. KR-20 was developed first as a method to compute reliability for items scored dichotomously, as either “0” or “1”, as they often are for academic tests or personality inventories (such as the Geriatric Depression Scale that we use as an example earlier in the book). Afterward, the mechanics behind KR-20 was further refined to create alpha, a measure of reliability that is more general than KR-20 and applicable to items with dichotomous or continuous scales. Both measures will yield the same estimate from dichotomous data, but only alpha can estimate reliability for a non-dichotomous scale. Thus, alpha has emerged as the most general and preferred indicator of reliability in modern statistical methodology. Others have further extended the use of alpha through development of a test to determine whether alpha is the same across two samples (Feldt, 1980), and methods to estimate of confidence intervals for alpha (see Barnette, 2005).[2]
The correct interpretation of alpha. Cronbach (1951) himself wrote and provided proofs for several assertions about alpha:
  • Alpha is n/n-1 times the ratio of inter-item covariance to total variance—in other words, a direct assessment of the ratio of error (unexplained) variance in the measure.
  • The average of all possible split-half coefficients for a given test.[3]
  • The coefficient of equivalence from two tests composed of items randomly sampled (without replacement) from a universe of items with the mean covariance as the test or scale in questions.
  • A lower-bound estimate of the coefficient of precision (accuracy of the test with these particular items) and coefficient of equivalency (simultaneous administration of two tests with matching items).
  • The proportion (lower-bound) of the test variance that is due to all common factors among the items.
As Nunnally & Bernstein (1994, p. 235) distill from all this, alpha is an expected correlation between one test and an alternative form of the test containing the same number of items. The square root of alpha is also, as they point out, the correlation between the score on a scale and errorless “true scores.” Let us unpack this for a moment.
This means that if one has an alpha of 0.80 for a scale, it should be interpreted as the expected correlation between that scale and another scale sampled from the same domain of items, and with the same covariance and number of items. The square root of 0.80 is 0.89, which represents an estimate of the correlation between that score and the “true scores” for that construct. As you probably know, the square of a correlation is an estimate of shared variance, so squaring this number leads us back to the proportion of “true score” in the measurement: .80. Finally, we can subtract our “true score” variance from our total variance to get an estimate of our proportion of error variance in the measurement, in this case 20%. Thus, a scale with an alpha of .80 includes approximately 20% error.
What alpha is not. Alpha is not a measure of unidimensionality (an indicator that a scale is measuring a single construct rather than multiple related constructs) as is often thought (Cortina, 1993; Schmitt, 1996). Unidimensionality is an important assumption of alpha, in that scales that are multidimensional will cause alpha to be under-estimated if not assessed separately for each dimension, but high values for alpha are not necessarily indicators of unidimensionality (e.g., Cortina, 1993; Schmitt, 1996).
Also, as we mentioned before, alpha is not a characteristic of the instrument, but rather it is a characteristic of the sample in which the instrument was used. A biased, unrepresentative, or small sample could produce a very different estimate than a large, representative sample. Furthermore, the estimates from one large, supposedly representative sample can differ from another, and the results in one population can most certainly differ from another. This is why we place such an emphasis on replication. Replication is necessary to support the reliability of an instrument. In addition, the reliability of an established instrument must be re-established when using the instrument with a new population.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset