We are interested in predicting the profits of a business for the year 2018 given its profits for the previous years:
Year |
Profit in USD |
2011 |
40k |
2012 |
43k |
2013 |
45k |
2014 |
50k |
2015 |
54k |
2016 |
57k |
2017 |
59k |
2018 |
? |
Analysis:
In this example, the profit is always increasing, so we can think of representing the profit as a growing function dependent on the time variable represented by years. The differences in profit between the subsequent years are: 3k, 2k, 5k, 4k, 3k, and 2k USD. These differences do not seem to be affected by time, and the variation between them is relatively low. Therefore, we may try to predict the profit for the coming years by performing a linear regression. We express profit p in terms of the year y in the linear equation, also called a trend line:
profit=a*year+b
We can find the constants a and b with linear regression.
Input:
We store the data from the table above in the vectors year and profit in R script.
# source_code/7/profit_year.r business_profits = data.frame( year = c(2011,2012,2013,2014,2015,2016,2017),
profit = c(40,43,45,50,54,57,59) ) model = lm(profit ~ year, data = business_profits)
print(model)
Output:
$ Rscript profit_year.r
Call:
lm(formula = profit ~ year, data = business_profits)
Coefficients:
(Intercept) year
-6711.571 3.357
Visualization:
Conclusion:
Therefore, the trend line equation for the profit of the company is: profit=3.357*year-6711.571.
From this equation, we can predict the profit for the year 2018 to be profit=3.357*2018-6711.571=62.855k USD or 62855 USD.
This example was simple - we were able to make a prediction just by using linear regression on the trend line. In the next example, we will look at data subject to both trends and seasonality.