Univariate time series forecasting

With this task, the objective is to produce a univariate forecast for the surface temperature, focusing on choosing either a Holt linear trend model or an ARIMA model. We will train the models and determine their predictive accuracy on an out-of-time test set, just like we've done in other learning endeavors. The following code creates the temperature subset and then the train and test sets, starting after WWII:

    > temp <- climate[, 2]

> temp <- climate[, 2]

> train <- window(temp, start = 1946, end = 2003)

> test <- window(temp, start = 2004)

To build our smoothing model, we will use the holt() function found in the forecast package. We will build two models, one with and one without a damped trend. In this function, we will need to specify the time series, number of forecast periods as h = ..., method to select the initial state values, either "optimal" or "simple", and whether we want a damped trend. Specifying "optimal", the algorithm will find optimal initial starting values along with the smoothing parameters, while "simple" calculates starting values using the first few observations. Now, in the forecast package, you can use the ets() function, which will find all the optimal parameters. However, in our case, let's stick with holt() so that we can compare methods. Let's try the holt model without a damped trend, as follows:

     > fit.holt <- holt(train, h = 10, initial = "optimal")

Plot the forecast and see how well it performed out of sample with the following code:

    > plot(forecast(fit.holt))

> lines(test, type = "o")

The output of the preceding code is as follows:

Looking at the plot, it seems that this forecast is showing a slight linear uptrend. Let's have a go by including the damped trend, as follows:

    > fit.holtd <- holt(train, h = 10, initial = "optimal", damped = 
TRUE)

> plot(forecast(fit.holtd),main = "Holt Damped")


> lines(test, type = "o")

The output of the preceding code is as follows: 

Lastly, in univariate analysis, we develop an ARIMA model, using auto.arima(), which is also from the forecast package. There are many options that you can specify in the function, or you can just include your time series data and it will find the best ARIMA fit:

    > fit.arima <- auto.arima(train)
> summary(fit.arima)
Series: train
ARIMA(0,1,1) with drift

Coefficients:
ma1 drift
-0.6949 0.0094
s.e. 0.1041 0.0047

The abbreviated output shows that the model selected is an MA = 1, I = 1, or ARIMA(0,1,1) with drift (equivalent to an intercept term). We can examine the plot of its performance on the test data in the same fashion as before:

    > plot(forecast(fit.arima, h = 10))

> lines(test, type="o")

The output of the preceding code is as follows: 

This is very similar to the holt method with no damped trend. We can score each model to find the one that provides the lowest error, mean absolute percentage error (MAPE), with the following code:

    > mapeHOLT <- sum(abs((test - fit.holt$mean)/test))/10

> mapeHOLT

[1] 0.105813

> mapeHOLTD <- sum(abs((test - fit.holtd$mean)/test))/10


> mapeHOLTD

[1] 0.2220256

> mapeARIMA <- sum(abs((test - forecast(fit.arima, h =
10)$mean)/test))/10


> mapeARIMA

[1] 0.1034813

The forecast error is slightly less with the ARIMA 0,1,1 versus the holt methods, and clearly, the damped trend model performed the worst.

With the statistical and visual evidence, it seems that the best choice for a univariate forecast model is the ARIMA model. Interestingly, in the first edition using annual data, the Holt method with a damped trend had the best accuracy.

With this, we've completed the building of a univariate forecast model for the surface temperature anomalies, and now we will move on to the next task of seeing if CO2 levels cause these anomalies.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset