Correlation
analysis quantifies the linear association between two continuous
variables. The correlation coefficient measures this linear association
of a scale from -1 to 1 and expresses two aspects of the relationship.
The sign of the correlation coefficient tells us whether there is
a direct or inverse relationship between the two variables. The correlation
coefficient for a direct relationship will have a positive sign while
the correlation coefficient for an inverse relationship will have
a negative sign. A correlation coefficient of 0 indicates that there
is no relationship between the two variables. The correlation coefficient
also describes the strength of the relationship. A larger correlation
coefficient, in absolute value, indicates are stronger relationship,
while a relatively smaller correlation coefficient indicates a weaker
relationship. Examining the correlation coefficient helps the analyst
describe the direction and strength of the relationships between two
continuous variables. Correlation between two variables does not
imply a causal relationship. For example, population size does not
cause smoking-related health care costs. Smoking is caused by factors
such as youthful experimentation, stress, or family history. The magnitude
of state health care costs is associated with state population.
Correlation coefficients can be calculated using the
JMP Multivariate Methods platform. To view a matrix of correlation
coefficients, select Analyze > Multivariate Methods > Multivariate
and enter the desired continuous variables in the Y, Columns field.
Figure 5.12 Correlation Matrix and Scatterplot Matrix shows the JMP
output for the pairwise correlations between total health care expense,
cessation expense, and population.
Both total health care
expense and cessation expense are directly associated with population
size. Total health care expense and population size have a correlation
coefficient of 0.9761 and cessation expense and population size have
a correlation coefficient of 0.4088. Total health care is more strongly
correlated to population size. This makes sense as a larger population
will have more smokers and hence more smoking-related health care
expenditures. Cessation expenditures are likely influenced by factors
besides population size such as state public health priorities and
policies.
The scatterplot matrix shows the relationships
visually and allows outliers to be identified. The data points on
the scatterplot for total health care expense and population size
are tightly coupled as compared to the scatterplot for cessation expense
and population size. Cessation expense and total health care expense
have a modest correlation of 0.5274. In the associated scatterplot
matrix, California, Texas, and New York, three large population states,
appear to be outliers.
Other correlations can
be similarly calculated from the remaining continuous variable given
in the data set such as land area, state gross domestic product, and
state cigarette taxes. For example, there is a direct and modest
correlation of 0.491 between state cigarette tax and adjusted total
health care expense. The correlation between state cigarette tax
and total health care expense is quite close to zero at 0.088.