Resampling time series data

In this tutorial, we will learn how to resample time series with Pandas.

How to do it...

We will download the daily price time series data for AAPL, and resample it to monthly data by computing the mean. We will accomplish this by creating a Pandas DataFrame, and calling its resample method.

  1. Creating a date-time index.

    Before we can create a Pandas DataFrame, we need to create a DatetimeIndex method to pass to the DataFrame constructor. Create the index from the downloaded quotes data as follows:

    dt_idx = pandas.DatetimeIndex(quotes.date)
  2. Creating the data frame.

    Once we have the date-time index, we can use it together with the close prices to create a data frame:

    df = pandas.DataFrame(quotes.close, index=dt_idx, columns=[symbol])
  3. Resample.

    Resample the time series to monthly frequency, by computing the mean:

    resampled = df.resample('M', how=numpy.mean)
    print resampled 

    The resampled time series, as shown in the following, has one value for each month:

                      AAPL
    2011-01-31  336.932500
    2011-02-28  349.680526
    2011-03-31  346.005652
    2011-04-30  338.960000
    2011-05-31  340.324286
    2011-06-30  329.664545
    2011-07-31  370.647000
    2011-08-31  375.151304
    2011-09-30  390.816190
    2011-10-31  395.532381
    2011-11-30  383.170476
    2011-12-31  391.251429
    
  4. Plot.

    Use the DataFrame plot method to plot the data:

    df.plot()
    resampled.plot()
    show()

    The plot for the original time series is as follows:

    How to do it...

    The resampled data has less data points, and therefore, the resulting plot, as shown in the following image, is choppier:

    How to do it...

The complete resampling code is as follows:

import pandas
from matplotlib.pyplot import show, legend
from datetime import datetime
from matplotlib import finance
import numpy

# Download AAPL data for 2011 to 2012
start = datetime(2011, 01, 01)
end = datetime(2012, 01, 01)

symbol = "AAPL"
quotes = finance.quotes_historical_yahoo(symbol, start, end, asobject=True)

# Create date time index
dt_idx = pandas.DatetimeIndex(quotes.date)

#Create data frame
df = pandas.DataFrame(quotes.close, index=dt_idx, columns=[symbol])

# Resample with monthly frequency
resampled = df.resample('M', how=numpy.mean)
print resampled 
 
# Plot
df.plot()
resampled.plot()
show()

How it works...

We created a date-time index from a list of date and times. This index was then used to create a Pandas data frame. We then resampled our time series data. The resampling frequency is given by a single character:

  • D for daily
  • M for monthly
  • A for annual

The how parameter of the resample method indicates how the data is sampled. This defaults to calculating the mean.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset