The intuition behind linear regression

When applying linear regression to a set of data, we are making the following assumption—the relationship between one (or more) explanatory variable and the response variable is known and linear. There are two points to consider:

Known: We are assuming the existence of some kind of law ruling the level of y given the level of x. We are also usually implying that the level of x directly causes the level of y. We know from our discussion about linear correlation that this is not necessarily true and that further evidence is needed to assume causality.
Linear: The relation between the explanatory variables and a response is assumed to be representable as a linear combination of the explanatory variables plus an error term.

A real-world example of this kind of relationship is the straight-line motion equation, directly coming from the realm of physics. That equation states the linear and immutable relationship between time and speed, given a certain acceleration. We have, therefore:

If we try to plot this considering a certain acceleration, let's say 4 m/s² and three different moments 0,2,3, we will get the linear nature of this relationship:

For moment 0, we will have a speed equal to 0
For moment 2, we will have a speed equal to 2*4 = 8
For moment 3, we will find a speed of 3*4 = 12

Let's jot down a plot:

If we look closer at the equation, we see that the most relevant component is the acceleration coefficient. This number is actually the one that defines how inclined the line will be. You can be sure of this by changing 4 to 7 and re-performing the same computation and plotting: doesn't it get steeper?

In a more formal way, that component is called the slope, and its estimation is actually the core point of linear regression models.

Table of Contents for The intuition behind linear regression

Create new playlist

Sign In

Sign Up

Table of Contents for
The intuition behind linear regression