Advanced uses of pandas for data analysis

In this section we will consider some advanced pandas use cases.

Hierarchical indexing

Hierarchical indexing provides us with a way to work with higher dimensional data in a lower dimension by structuring the data object into multiple index levels on an axis:

>>> s8 = pd.Series(np.random.rand(8), index=[['a','a','b','b','c','c', 'd','d'], [0, 1, 0, 1, 0,1, 0, 1, ]])
>>> s8
a  0    0.721652
   1    0.297784
b  0    0.271995
   1    0.125342
c  0    0.444074
   1    0.948363
d  0    0.197565
   1    0.883776
dtype: float64

In the preceding example, we have a Series object that has two index levels. The object can be rearranged into a DataFrame using the unstack function. In an inverse situation, the stack function can be used:

>>> s8.unstack()
          0         1
a  0.549211  0.420874
b  0.051516  0.715021
c  0.503072  0.720772
d  0.373037  0.207026

We can also create a DataFrame to have a hierarchical index in both axes:

>>> df = pd.DataFrame(np.random.rand(12).reshape(4,3),
                      index=[['a', 'a', 'b', 'b'],
                               [0, 1, 0, 1]],
                      columns=[['x', 'x', 'y'], [0, 1, 0]])
>>> df
            x                   y
            0         1         0
a 0  0.636893  0.729521  0.747230
  1  0.749002  0.323388  0.259496
b 0  0.214046  0.926961  0.679686
0.013258  0.416101  0.626927
>>> df.index
MultiIndex(levels=[['a', 'b'], [0, 1]],
           labels=[[0, 0, 1, 1], [0, 1, 0, 1]])
>>> df.columns
MultiIndex(levels=[['x', 'y'], [0, 1]],
           labels=[[0, 0, 1], [0, 1, 0]])

The methods for getting or setting values or subsets of the data objects with multiple index levels are similar to those of the nonhierarchical case:

>>> df['x']
            0         1
a 0  0.636893  0.729521
  1  0.749002  0.323388
b 0  0.214046  0.926961
0.013258  0.416101
>>> df[[0]]
            x
            0
a 0  0.636893
  1  0.749002
b 0  0.214046
0.013258
>>> df.ix['a', 'x']
          0         1
0  0.636893  0.729521
0.749002  0.323388
>>> df.ix['a','x'].ix[1]
0    0.749002
1    0.323388
Name: 1, dtype: float64

After grouping data into multiple index levels, we can also use most of the descriptive and statistics functions that have a level option, which can be used to specify the level we want to process:

>>> df.std(level=1)
          x                   y
          0         1         0
0  0.298998  0.139611  0.047761
0.520250  0.065558  0.259813
>>> df.std(level=0)
          x                   y
          0         1         0
a  0.079273  0.287180  0.344880
b  0.141979  0.361232  0.037306
Hierarchical indexing

The Panel data

The Panel is another data structure for three-dimensional data in pandas. However, it is less frequently used than the Series or the DataFrame. You can think of a Panel as a table of DataFrame objects. We can create a Panel object from a 3D ndarray or a dictionary of DataFrame objects:

# create a Panel from 3D ndarray
>>> panel = pd.Panel(np.random.rand(2, 4, 5),
                     items = ['item1', 'item2'])
>>> panel
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 4 (major_axis) x 5 (minor_axis)
Items axis: item1 to item2
Major_axis axis: 0 to 3
Minor_axis axis: 0 to 4

>>> df1 = pd.DataFrame(np.arange(12).reshape(4, 3), 
                       columns=['a','b','c'])
>>> df1
   a   b   c
0  0   1   2
1  3   4   5
2  6   7   8
9  10  11
>>> df2 = pd.DataFrame(np.arange(9).reshape(3, 3), 
                       columns=['a','b','c'])
>>> df2
   a  b  c
0  0  1  2
1  3  4  5
6  7  8
# create another Panel from a dict of DataFrame objects
>>> panel2 = pd.Panel({'item1': df1, 'item2': df2})
>>> panel2
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 4 (major_axis) x 3 (minor_axis)
Items axis: item1 to item2
Major_axis axis: 0 to 3
Minor_axis axis: a to c

Each item in a Panel is a DataFrame. We can select an item, by item name:

>>> panel2['item1']
   a   b   c
0  0   1   2
1  3   4   5
2  6   7   8
3  9  10  11

Alternatively, if we want to select data via an axis or data position, we can use the ix method, like on Series or DataFrame:

>>> panel2.ix[:, 1:3, ['b', 'c']]
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 3 (major_axis) x 2 (minor_axis)
Items axis: item1 to item2
Major_axis axis: 1 to 3
Minor_axis axis: b to c
>>> panel2.ix[:, 2, :]
   item1  item2
a      6      6
b      7      7
c      8      8
The Panel data
The Panel data
The Panel data
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset