Data exploration

Let's start out with a plot of the time series using base R:

> plot(climate_ts, main = "CO2 and Temperature Deviation")

The output of the preceding command is as follows:

It appears that CO2 levels really started to increase after World War II and there's a rapid rise in temperature anomalies in the mid-1970s. There doesn't appear to be any obvious outliers, and variation over time appears constant. Using the standard procedure, we can see that the two series are highly correlated, as follows:

    > cor(climate_temp)
CO2 Temp
CO2 1.0000000 0.8404215
Temp 0.8404215 1.0000000

As discussed earlier, this is nothing to jump for joy about as it proves absolutely nothing. We'll look for the structure by plotting ACF and PACF for both series:

> forecast::autoplot(acf(climate_ts[, 2], plot = F), main="Temperature ACF")

The output of the preceding code snippet is as follows:

This code gives us the PACF plot for temperature:

> forecast::autoplot(pacf(climate_ts[, 2], plot = F), main = "Temperature PACF")

The output of the preceding code snippet is as follows:

This code gives us the ACF plot for CO2:

> forecast::autoplot(acf(climate_ts[, 1], plot = F), main = "CO2 ACF")

The output of the preceding code snippet is as follows:

This code gives us the PACF plot for CO2:

> forecast::autoplot(acf(climate_ts[, 1], plot = F), main = "CO2 PACF")

The output of the preceding code snippet is as follows:

With the slowly decaying ACF patterns and rapidly decaying PACF patterns, we can assume that these series are both autoregressive, although Temp appears to have some significant MA terms. Next, let's have a look at the Cross-Correlation Function (CCF). Note we put our x before our y in the function:

> forecast::autoplot(ccf(climate_ts[, 1], climate_ts[, 2], plot = F), main = "CCF")

The output of the preceding code is as follows:

The CCF shows us the correlation between the temperature and lags of CO2. If the negative lags of the x variable have a high correlation, we can say that x leads y. If the positive lags of x have a high correlation, we say that x lags y. Here, we can see that CO2 is both a leading and lagging variable. For our analysis, it's encouraging that we see the former, but odd that we see the latter. We'll see during the VAR and Granger causality analysis whether this will matter or not.

Additionally, we need to test whether the data is stationary. We can prove this with the Augmented Dickey-Fuller (ADF) test available in the tseries package, using the adf.test() function, as follows:

    > tseries::adf.test(climate_ts[, 1])

Augmented Dickey-Fuller Test

data: climate_ts[, 1]
Dickey-Fuller = -1.1519, Lag order = 4, p-value =
0.9101
alternative hypothesis: stationary

> tseries::adf.test(climate_ts[, 2])

Augmented Dickey-Fuller Test

data: climate_ts[, 2]
Dickey-Fuller = -1.8106, Lag order = 4, p-value =
0.6546
alternative hypothesis: stationary

For both series, we have insignificant p-values, so we cannot reject the null and conclude that they aren't stationary.

Having explored the data, let's begin the modeling process, starting with the application of univariate techniques to the temperature anomalies.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset