Data understanding and preparation

Two packages are required for this effort, so ensure they are installed on your system:

    > library(forecast)

> library(tseries)

Let's start out with plots of the two time series:

    > autoplot(climate)

The output of the preceding command is as follows:

It appears that CO2 levels really started to increase after World War II and a rapid rise in temperature anomalies in the mid-1970s. There does not appear to be any obvious outliers, and variation over time appears constant. Using the standard procedure, we can see that the two series are highly correlated, as follows:

    > cor(climate)
CO2 Temp
CO2 1.0000000 0.8404215
Temp 0.8404215 1.0000000

As discussed earlier, this is nothing to jump for joy about as it proves absolutely nothing. We will look for the structure by plotting ACF and PACF for both series:

    > autoplot(acf(climate[, 2], plot = F), main="Temp ACF")

The output of the preceding code snippet is as follows:

This code gives us the PACF plot for temperature:

    > autoplot(pacf(climate[, 2], plot = F), main = "Temp PACF")

The output of the preceding code snippet is as follows:

This code gives us the ACF plot for CO2:

    > autoplot(acf(climate[, 1], plot = F), main = "CO2 ACF")

The output of the preceding code snippet is as follows:

This code gives us the PACF plot for CO2:

    > autoplot(pacf(climate[, 1], plot = F), main = "CO2 PACF")

The output of the preceding code snippet is as follows:

With the slowly decaying ACF patterns and rapidly decaying PACF patterns, we can assume that these series are both autoregressive, although temp appears to have some significant MA terms. Next, let's have a look at Cross Correlation Function (CCF). Note that we put our x before our y in the function:

    > ccf(climate[, 1], climate[, 2], main = "CCF")

CCF shows us the correlation between the temperature and lags of CO2.  If the negative lags of the x variable have a high correlation, we can say that x leads y. If the positive lags of x have a high correlation, we say that x lags y. Here, we can see that CO2 is both a leading and lagging variable. For our analysis, it is encouraging that we see the former, but odd that we see the latter. We will see during the VAR and Granger causality analysis whether this will matter or not.

Additionally, we need to test whether the data is stationary. We can prove this with the Augmented Dickey-Fuller (ADF) test available in the tseries package, using the adf.test() function, as follows:

    > adf.test(climate[, 1])

Augmented Dickey-Fuller Test

data: climate[, 1]
Dickey-Fuller = -1.1519, Lag order = 4, p-value =
0.9101
alternative hypothesis: stationary

> adf.test(climate[, 2])

Augmented Dickey-Fuller Test

data: climate[, 2]
Dickey-Fuller = -1.8106, Lag order = 4, p-value =
0.6546
alternative hypothesis: stationary

For both series, we have insignificant p-values, so we cannot reject the null and conclude that they are not stationary.

Having explored the data, let's begin the modeling process, starting with the application of univariate techniques to the temperature anomalies.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset