- Read in the college dataset, and find a few basic summary statistics on the undergraduate population and SAT math scores by institution and religious affiliation:
>>> college = pd.read_csv('data/college.csv')
>>> cg = college.groupby(['STABBR', 'RELAFFIL'])
['UGDS', 'SATMTMID']
.agg(['size', 'min', 'max']).head(6)
- Notice that both index levels have names and are the old column names. The column levels, on the other hand, do not have names. Use the rename_axis method to supply level names to them:
>>> cg = cg.rename_axis(['AGG_COLS', 'AGG_FUNCS'], axis='columns')
>>> cg
- Now that each axis level has a name, reshaping is a breeze. Use the stack method to move the AGG_FUNCS column to an index level:
>>> cg.stack('AGG_FUNCS').head()
- By default, stacking places the new column level in the innermost position. Use the swaplevel method to switch the placement of the levels:
>>> cg.stack('AGG_FUNCS').swaplevel('AGG_FUNCS', 'STABBR',
axis='index').head()
- We can continue to make use of the axis level names by sorting levels with the sort_index method:
>>> cg.stack('AGG_FUNCS')
.swaplevel('AGG_FUNCS', 'STABBR', axis='index')
.sort_index(level='RELAFFIL', axis='index')
.sort_index(level='AGG_COLS', axis='columns').head(6)
- To completely reshape your data, you might need to stack some columns while unstacking others. Chain the two methods together in a single command:
>>> cg.stack('AGG_FUNCS').unstack(['RELAFFIL', 'STABBR'])
- Stack all the columns at once to return a Series:
>>> cg.stack(['AGG_FUNCS', 'AGG_COLS']).head(12)
STABBR RELAFFIL AGG_FUNCS AGG_COLS
AK 0 count UGDS 7.0
SATMTMID 0.0
min UGDS 109.0
max UGDS 12865.0
1 count UGDS 3.0
SATMTMID 1.0
min UGDS 27.0
SATMTMID 503.0
max UGDS 275.0
SATMTMID 503.0
AL 0 count UGDS 71.0
SATMTMID 13.0
dtype: float64