Chapter 16
IN THIS CHAPTER
Understanding time series
Moving averages
Smoothing things out
Forecasting with just one (?) click
In many fields (science, medicine, business), it’s often necessary to take measurements over successive intervals of time. When you have this kind of data, you have yourself a time series. This chapter tells you about time series and how to use Excel to understand them and use them to make forecasts.
Managers often have to base their decisions on time series — like sales figures — and the numbers in a time series typically show numerous ups and downs.
Here’s an example. The (totally fictional) FarDrate Timepiece Corporation markets the beautifully designed MeesKyte watch, and they gather the quarterly national sales figures. Figure 16-1 shows a spreadsheet and a graph of these sales figures from 2011–2015.
Take a look at the graph. The peaks and valleys seem pretty regular. Sales peak in the last quarter of every year — perhaps people buy these watches as holiday gifts. Sales also peak somewhat (but not as much) in the middle of each year — maybe parents are buying MeesKyte watches as graduation presents. This pattern repeats from year to year, and it’s called the seasonal component of the time series.
We also see sales moving generally in an upward direction. That might result from an expanding economy, or perhaps increasing awareness of the brand (or both). This is called the trend component of the time series. In this example, the trend is linear, but that’s not always the case in a time series.
And we see the peaks and valleys varying around the upward trend. This is called the cyclic component of the time series.
Finally, sporadic nonrecurring influences can affect a time series. This is known as the irregular component of a time series.
A mean that takes all the peaks and valleys into account might obscure the big picture of the overall trend.
One way to smooth out the bumps and still see the big picture is to calculate a moving average. This is an average calculated from the most recent scores in the time series. It moves because you keep calculating it over the time series. As you add a score to the front end, you delete one from the back end.
So we have the MeesKyte sales figures for 20 quarters, and we decide to keep a moving average for the most recent five quarters. Start with the average from quarters 1–5 of those 20 quarters (Mar 11 through Mar 12). Then average the prices from quarters 2–6 (Apr 11 through Apr 12). Next, average quarters 3–7, and so on, until you average the final 5 quarters of the time series.
Excel provides two ways of calculating the moving average. One is quick and dirty, (it’s a trendline option on the chart), and the other is a data analysis tool.
Here’s how to use the trendline option:
Enter your data into a spreadsheet.
I entered the data into Columns A and B, rows 1-21. Row 1 contains headings.
Select the data and insert a line chart.
On the Insert tab, in the Charts area, select Line Chart with Markers.
In the Format Trendline panel, select Moving Average and change the period.
I changed it to 5 for this example.
Figure 16-2 shows the Format Trendline panel and the moving average line in the chart. The beauty of this technique is that it enables you to experiment with different periods (also known as intervals) for the moving average and immediately see how each one looks on the chart. Unfortunately, this technique does not show you the numerical values of the moving averages. To calculate those values, you use …
The Moving Average tool charts the moving average and presents the numerical values, too. Unlike with the trendline technique, however, you can’t experiment with different periods on the fly.
The steps are:
With the data entered into the spreadsheet, select Data | Data Analysis and select Moving Average from Analysis Tools.
This opens the Moving Average dialog box.
In the Input Range box, enter the data array.
For this example, that’s B1 through B21. I don’t include the column with the dates.
Fill in the remaining boxes.
I select the Labels in First Row check box, and I enter 5 in the Interval box. (The word Interval corresponds to the word Period in the trendline technique.) Then I enter cells C2:C22 in the Output Range box and click the Chart Output check box and the Standard Errors check box. Figure 16-3 shows the Moving Average dialog box after all these entries.
Click OK.
This puts the moving averages in Column C and the standard errors in Column D, and then creates a chart of the data and the moving averages. Figure 16-4 shows all this.
Ignore the ugly-looking #N/A symbols. Each number in Column C is a moving average — which, as I mention earlier, is a forecast of the number of sales on the basis of the preceding five quarters.
Each number in Column D is a standard error. In this context, a standard error is the square root of the average of the squared differences between the sales and the forecast for the previous five quarters. So the first standard error in cell D10 is
The graph (stretched out from its original appearance and with a reformatted vertical axis) shows the moving average in the series labeled Forecast. Sometimes the forecast matches up with the data, sometimes it doesn’t. As the figure shows, the moving average smoothes out the peaks and the valleys in the sales data.
In general, how many scores should you include in the interval? That’s up to you. Include too many and you risk obsolete data influencing your results. Include too few and you risk missing something important.
Exponential smoothing is similar to a moving average. It’s a technique for forecasting based on prior data. In contrast with the moving average, which works just with a sequence of actual values, exponential smoothing takes its previous prediction into account.
Exponential smoothing operates according to a damping factor — a number between zero and one. With α representing the damping factor, the formula is
In terms of sales figures from the preceding example, y’t represents the predicted sales at a time: t. If t is the current quarter, t-1 is the immediately preceding quarter. So yt-1 is the preceding quarter’s actual sales and y’t-1 is the preceding quarter’s predicted sales. The sequence of predictions begins with the first predicted value as the observed value from the immediately preceding quarter.
A larger damping factor gives more weight to the preceding quarter’s prediction. A smaller damping factor gives greater weight to the preceding quarter’s actual value. A damping factor of 0.5 weighs each one equally.
Figure 16-5 shows the dialog box for the Exponential Smoothing data analysis tool. It’s similar to the Moving Average tool, except for the Damping Factor box.
I applied exponential smoothing to the data from the preceding example. I did this three times with 0.1, 0.5, and 0.9 as the damping factors. Figure 16-6 shows the graphic output for each result.
The highest damping factor, 0.9, results in the flattest sequence of forecasts. The lowest, 0.1, forecasts the most pronounced set of peaks and valleys. How should you set the damping factor? Like the interval in the moving average, that’s up to you. Your experience and the specific area of application are the determining factors.
You might have noticed from Figure 16-6, however, that the predictions of exponential smoothing don’t appear to be all that accurate for this example.
What to do? Read on… .
New to Excel 2016 (Windows only) is a capability that enables you to select a time series and with a mouse-click (several, actually) receive a set of extrapolated forecasts along with confidence intervals for each forecast.
The new forecasting capability uses Excel’s new (Windows only) FORECAST
functions:
FORECAST.ETS
creates a forecast based on triple exponential smoothing. (See the earlier sidebar “Tripling the fun — exponentially.”)FORECAST.ETS.CONFINT
returns a confidence interval for a forecast value.FORECAST.ETS.STAT
returns values of statistics associated with the ETS forecast.FORECAST.ETS.SEASONALITY
determines the length of the seasonal pattern within the data.I could tell you about each one, but I don’t think you’ll ever use any of them individually.
Let’s forecast!
Here are the steps:
Enter the data, with dates in one column.
As per our example, the data are in Columns A and B.
On the Data tab, in the Forecast area, select Forecast Sheet.
This opens the Create Forecast Worksheet dialog box shown in Figure 16-7. As you can see, Excel is already hard at work figuring things out for you. The blue line is the data, the bold orange line is the forecasts, and the lighter orange lines are the 95% confidence limits for the forecasts.
In the Create Forecast Worksheet dialog box, click the Options arrow in the bottom left corner.
This expands the dialog box, as Figure 16-8 shows.
Make necessary modifications in the expanded dialog box.
To get an idea about forecast accuracy, I move the date in the Forecast Start box back to March 31, 2015. This way, I can match some of the forecasts with data.
The Seasonality area shows 2 in the grayed-out box next to Set Manually. This means that FORECAST.ETS.SEASONALITY
has detected a pattern that repeats every two quarters. As the FarDrate CEO, though, I believe that the pattern repeats every four quarters, so I select the Set Manually radio button and change the 2 to 4. The chart in the dialog box changes immediately to show way more accurate forecasts and narrower 95% confidence limits.
I select the check box next to Include Forecast Statistics. I have the option of changing confidence intervals and some other things, but what I’ve done is sufficient.
Click Create.
This opens the created Forecast Worksheet, shown in Figure 16-9.
Along with the chart, the Forecast Worksheet shows the forecast values in Column C (calculated by FORECAST.ETS
), and the confidence limits in Columns D and E (calculated by FORECAST.ETS.CONFINT
).
The Forecast Worksheet also provides the Forecast Statistics in Columns G and H (courtesy of FORECAST.ETS.STAT
), rows 1-8, as Figure 16-10 shows.
Alpha, Beta, and Gamma are the values I refer to in the sidebar. The other four are measures of how well the forecasts fit the data. The first two are a bit involved, so I’ll let them slide.
The third one, MAE, is the mean absolute error — the average of the absolute value of the differences between data and forecasts. The last one, RMSE, is root mean squared error, which is the average of the squared deviations of the data from the forecasts. You can think of the standard error of estimate in regression (refer to Chapter 14) as a root mean square error adjusted for degrees of freedom.