Univariate time series analysis

We'll focus on two methods to analyze and forecast a single time series: exponential smoothing and Autoregressive Integrated Moving Average (ARIMA) models. We'll start by looking at exponential smoothing models.

Like moving average models, exponential smoothing models use weights for past observations. But unlike moving average models, the more recent the observation, the more weight it's given relative to the later ones. There are three possible smoothing parameters to estimate: the overall smoothing parameter, a trend parameter, and the seasonal smoothing parameter. If no trend or seasonality is present, then these parameters become null.

The smoothing parameter produces a forecast with the following equation:

In this equation, Y_t is the value at the time, T, and alpha (α) is the smoothing parameter. Algorithms optimize the alpha (and other parameters) by minimizing the errors, Sum of Squared Error (SSE) or maximum likelihood.

The forecast equation along with trend and seasonality equations, if applicable, will be as follows:

The forecast, where A is the preceding smoothing equation and h is the number of forecast periods:
The trend equation:
The seasonality, where m is the number of seasonal periods:

This equation is referred to as the Holt-Winters method. The forecast equation is additive in nature with the trend as linear. The method also allows the inclusion of a dampened trend and multiplicative seasonality, where the seasonality proportionally increases or decreases over time. With these models, you don't have to worry about the assumption of stationarity as in an ARIMA model. Stationarity is where the time series has a constant mean, variance, and correlation between all of the time periods. Having said this, it's still important to understand the ARIMA models as there will be situations where they have the best performance.

Starting with the autoregressive model, the value of Y at time T is a linear function of the prior values of Y. The formula for an autoregressive lag-1 model AR(1) is . The critical assumptions for the model are as follows:

Et denotes the errors that are identically and independently distributed with a mean zero and constant variance
The errors are independent of Yt
Yt, Yt-1, Yt-n... is stationary, which means that the absolute value of Φ is less than one

With a stationary time series, you can examine the autocorrelation function (ACF). The ACF of a stationary series gives correlations between Yt and Yt-h for h = 1, 2...n. Let's use R to create an AR(1) series and plot it:

> install.packages("forecast")

> set.seed(1966)

> ar1 <- arima.sim(list(order = c(1, 0, 0), ar = 0.5), n = 200)

> forecast::autoplot(ar1, main = "AR1")

The following is the output of the preceding command:

Now, let's examine ACF:

> forecast::autoplot(acf(ar1, plot = F), main = "ACF of simulated AR1")

The output of the preceding command is as follows:

The ACF plot shows the correlations exponentially decreasing as the Lag increases. The dotted blue lines indicate the confidence bands of a significant correlation. Any line that extends above the high or below the low band is considered significant. In addition to ACF, we should also examine the partial autocorrelation function (PACF). The PACF is a conditional correlation, which means that the correlation between Yt and Yt-h is conditional on the observations that come between the two. One way to intuitively understand this is to think of a linear regression model and its coefficients. Let's assume that you have Y = B0 + B1X1 versus Y = B0 + B1X1 + B2X2. The relationship of X to Y in the first model is linear with a coefficient, but in the second model, the coefficient will be different because of the relationship between Y and X2 now being accounted for as well. Note that, in the following PACF plot, the partial autocorrelation value at lag-1 is identical to the autocorrelation value at lag-1, as this isn't a conditional correlation:

 > forecast::autoplot(pacf(ar1, plot = F), main = "PACF of simulated AR1")

The following is the output of the preceding command:

We can safely make the assumption that the series is stationary from the appearance of the preceding time series plot. We'll look at a couple of statistical tests in the practical exercise to ensure that the data is stationary but, on occasion, the eyeball test is sufficient. If the data isn't stationary, then it's possible to detrend the data by taking its differences. This is the Integrated (I) in ARIMA. After differencing, the new series becomes ΔYt = Yt - Yt-1. One should expect a first-order difference to achieve stationarity but, on some occasions, a second-order difference may be necessary. An ARIMA model with AR(1) and I(1) would be annotated as (1,1,0).

The MA stands for Moving Average. This isn't the simple moving average as the 50-day moving average of a stock price, it's rather a coefficient that is applied to the errors. The errors are, of course, identically and independently distributed with a mean zero and constant variance. The formula for an MA(1) model is Yt = constant + Et + ΘEt-1. As we did with the AR(1) model, we can build an MA(1) in R, as follows:

    > set.seed(123)
    
    > ma1 <- arima.sim(list(order = c(0, 0, 1), ma = -0.5), n = 200)
    
    > forecast::autoplot(ma1, main = "MA1")

The following is the output of the preceding command:

The ACF and PACF plots are a bit different from the AR(1) model. Note that there are some rules of thumb while looking at the plots in order to determine whether the model has AR and/or MA terms. They can be a bit subjective, so I'll leave it to you to learn these heuristics, but trust R to identify the proper model. In the following plots, we'll see a significant correlation at lag-1 and two significant partial correlations at lag-1 and lag-2:

> forecast::autoplot(acf(ma1, plot = F), main = "ACF of simulated MA1")

The output of the preceding command is as follows:

The preceding figure is the ACF plot, and now, we'll see the PACF plot:

> forecast::autoplot(pacf(ma1, plot = F), main = "PACF of simulated MA1")

The output of the preceding command is as follows:

With the ARIMA models, it's possible to incorporate seasonality, including the autoregressive, integrated, and moving average terms. The non-seasonal ARIMA model notation is commonly (p,d,q). With seasonal ARIMA, assume that the data is monthly, then the notation would be (p,d,q) x (P,D,Q)12, with the 12 in the notation taking the monthly seasonality into account. In the packages that we'll use, R can automatically identify whether the seasonality should be included; if so, the optimal terms will be included as well.

Table of Contents for Univariate time series analysis

Create new playlist

Sign In

Sign Up

Table of Contents for
Univariate time series analysis