Loading data as pandas objects from statsmodels

Statsmodels has quite a lot of sample datasets in its distributions. The complete list can be found at https://github.com/statsmodels/statsmodels/tree/master/statsmodels/datasets .

In this tutorial, we will concentrate on the copper dataset, which contains information about copper prices, world consumption, and other parameters.

Getting ready

Before we start, we might need to install patsy. It is easy enough to see if this is necessary just run the code. If you get errors related to patsy, you will need to execute any one of the following two commands:

sudo easy_install patsy
pip install --upgrade patsy

How to do it...

In this section, we will see how we can load a dataset from statsmodels as a Pandas DataFrame or Series object.

  1. Loading the data.

    The function we need to call is load_pandas. Load the data as follows:

    data = statsmodels.api.datasets.copper.load_pandas()

    This loads the data in a DataSet object, which contains pandas objects.

  2. Fitting the data.

    The Dataset object has an attribute exog, which when loaded as a pandas object, becomes a DataFrame object with multiple columns. It also has an endog attribute containing values for the world consumption of copper in our case.

    Perform an ordinary least squares calculation by creating an OLS object, and calling its fit method as follows:

    x, y = data.exog, data.endog
    
    fit = statsmodels.api.OLS(y, x).fit()
    print "Fit params", fit.params

    This should print the result of the fitting procedure, as follows:

    Fit params COPPERPRICE         14.222028
    INCOMEINDEX       1693.166242
    ALUMPRICE          -60.638117
    INVENTORYINDEX    2515.374903
    TIME               183.193035
    
  3. Summarize.

    The results of the OLS fit can be summarized by the summary method as follows:

    print fit.summary()

    This will give us the following output for the regression results:

    How to do it...

The code to load the copper data set is as follows:

import statsmodels.api

# See https://github.com/statsmodels
/statsmodels/tree/master/statsmodels/datasets
data = statsmodels.api.datasets.copper.load_pandas()

x, y = data.exog, data.endog

fit = statsmodels.api.OLS(y, x).fit()
print "Fit params", fit.params
print
print "Summary"
print
print fit.summary()

How it works...

The data in the Dataset class of statsmodels follows a special format. Among others, this class has the endog and exog attributes. Statsmodels has a load function, which loads data as NumPy arrays. Instead, we used the load_pandas method, which loads data as Pandas objects. We did an OLS fit, basically giving us a statistical model for copper price and consumption.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset