THE BASIC IDEA OF CORRELATION

The basic idea of correlation is how sensitive the value of one entity or metric is to the change in another value of an entity or metric. Let's take a very simple, tangible example as a starting point. There is usually an observable correlation between a person's income level and their education. Census data in the United States typically shows that more educated people earn more money. Look at the data set plotted in Figure 3.1.

The first thing to realize is that a scatter plot best shows correlation visually, between two sets of variables. Here we can see that there is a positive correlation, that is, one where as the value of one variable increases there is an increase in the value of another variable. The correlation is actually quite strong in this case (mainly because I made up the data set). When we work through Model Builder 3.1 we will see that there is a .87 correlation between education level and income in this example. Another way of thinking about what this means is to use extreme cases. If the correlation is equal to 1, the two variables are completely linked, and for every unit of increase in one of the variables there will be a proportional increase in the other variable. Conversely, if the correlation is equal to 0, it means that for every unit of increase in one of the variables there should be no expectation of proportional movement in the other variable.

FIGURE 3.1 Income levels are shown on the y-axis and show a distinct positive correlation with education levels on the x-axis.

image

The formula for correlation (usually represented by r) is shown in equation 3.1:

image

In this equation, σx represents the standard deviation of the X values and μx represents the mean of the X values (careful: μ has different meanings elsewhere in this text and in finance: it is often used to represent a “drift” return for assets).

It is crucial to keep in mind that correlation measures only the linear relationship between two variables. Quadratic and other “power” relationships between variables may give results indicating no correlation between data sets. When beginning initial research on the relationship between different variables, it is a good idea to graph the scatter-plots and inspect them to see if the graph “looks like” a quadratic relationship. If so, performing regressions of one variable against the root of another may be appropriate.

MODEL BUILDER 3.1: Basic Correlation Calculations

  1. This Model Builder examines the fundamental correlation calculations using mathematics on the sheet and prebuilt Excel functions. It will be useful since the correlation calculations here are the underpinnings for correlation calculated using matrix mathematics and eventually in the corporate default simulation. The first step is to create a new workbook and save it as “MB_3_User.xls”.
  2. Next we need to import data from the complete Model Builder file located on the website. Open MB_3_Complete.xls and copy the data from B3:D33. Paste the values of this data over the same range on your newly created sheet.
  3. To understand the underlying meaning of the correlation coefficient calculation we will first calculate the coefficient using basic math functions. The first part of this calculation is measuring how each of the samples, for each variable, deviate from the mean of each variable's sample data set. This can be done by entering the following equation in E4:

    =(C4-AVERAGE($C$4:$C$33))*(D4-AVERAGE($D$4:$D$33))

    Copy and paste this over the range E4:E33. Also, we should label E3 “Numerator” since the prior calculation is part of a larger calculation.

  4. Next we should calculate the deviations and square them for each variable. Enter the following equation in F4 and copy and paste it over the range F4:F33:

    =(C4-AVERAGE($C$4:$C$33))^2

  5. Similarly we should do the same for the other variable. Enter the following equation in G4 and copy and paste it over the range G4:G33:

    =(D4-AVERAGE($D$4:$D$33))^2

    Also, label both columns with “Denominator 1” in F3 and “Denominator 2” in G3. So far the sheet should look like Figure 3.2.

    FIGURE 3.2 The sheet should develop calculations for each sample.

    image

    FIGURE 3.3 The correlation coefficient can be calculated in multiple ways.

    image

  6. Sum each of the Numerator, Denominator 1, and Denominator 2 columns in E35, F35, and G35, respectively, using the SUM function.
  7. The denominators need to be multiplied and the square root taken of that product. To do this, enter the following equation in E36:

    =SQRT(F35*G35)

  8. Finally, the correlation coefficient is calculated by dividing the numerator by the equation calculated in step 7. Enter the following equation in E38 to get this result:

    =E35/E36

  9. Alternatively we can do this calculation using the CORREL function in D38:

    =CORREL(C4:C33,D4:D33)

    The final calculations should look the same as Figure 3.3.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset