In this tutorial, we will learn how to resample time series with Pandas.
We will download the daily price time series data for AAPL, and resample it to monthly data by computing the mean. We will accomplish this by creating a Pandas DataFrame
, and calling its
resample
method.
Before we can create a Pandas DataFrame
, we need to create a DatetimeIndex
method to pass to the DataFrame
constructor. Create the index from the downloaded quotes data as follows:
dt_idx = pandas.DatetimeIndex(quotes.date)
Once we have the date-time index, we can use it together with the close prices to create a data frame:
df = pandas.DataFrame(quotes.close, index=dt_idx, columns=[symbol])
Resample the time series to monthly frequency, by computing the mean:
resampled = df.resample('M', how=numpy.mean) print resampled
The resampled time series, as shown in the following, has one value for each month:
AAPL 2011-01-31 336.932500 2011-02-28 349.680526 2011-03-31 346.005652 2011-04-30 338.960000 2011-05-31 340.324286 2011-06-30 329.664545 2011-07-31 370.647000 2011-08-31 375.151304 2011-09-30 390.816190 2011-10-31 395.532381 2011-11-30 383.170476 2011-12-31 391.251429
Use the DataFrame plot
method
to plot the data:
df.plot() resampled.plot() show()
The plot for the original time series is as follows:
The resampled data has less data points, and therefore, the resulting plot, as shown in the following image, is choppier:
The complete resampling code is as follows:
import pandas from matplotlib.pyplot import show, legend from datetime import datetime from matplotlib import finance import numpy # Download AAPL data for 2011 to 2012 start = datetime(2011, 01, 01) end = datetime(2012, 01, 01) symbol = "AAPL" quotes = finance.quotes_historical_yahoo(symbol, start, end, asobject=True) # Create date time index dt_idx = pandas.DatetimeIndex(quotes.date) #Create data frame df = pandas.DataFrame(quotes.close, index=dt_idx, columns=[symbol]) # Resample with monthly frequency resampled = df.resample('M', how=numpy.mean) print resampled # Plot df.plot() resampled.plot() show()
We created a date-time index from a list of date and times. This index was then used to create a Pandas data frame. We then resampled our time series data. The resampling frequency is given by a single character:
D
for dailyM
for monthlyA
for annualThe
how
parameter of the resample
method indicates how the data is sampled. This defaults to calculating the mean.