- Read in the college dataset with the institution name as the index, and with only the undergraduate race columns:
>>> usecol_func = lambda x: 'UGDS_' in x or x == 'INSTNM'
>>> college = pd.read_csv('data/college.csv',
index_col='INSTNM',
usecols=usecol_func)
>>> college.head()
- Use the stack method to convert each horizontal column name into a vertical index level:
>>> college_stacked = college.stack()
>>> college_stacked.head(18)
INSTNM
Alabama A & M University UGDS_WHITE 0.0333
UGDS_BLACK 0.9353
UGDS_HISP 0.0055
UGDS_ASIAN 0.0019
UGDS_AIAN 0.0024
UGDS_NHPI 0.0019
UGDS_2MOR 0.0000
UGDS_NRA 0.0059
UGDS_UNKN 0.0138
University of Alabama at Birmingham UGDS_WHITE 0.5922
UGDS_BLACK 0.2600
UGDS_HISP 0.0283
UGDS_ASIAN 0.0518
UGDS_AIAN 0.0022
UGDS_NHPI 0.0007
UGDS_2MOR 0.0368
UGDS_NRA 0.0179
UGDS_UNKN 0.0100
dtype: float64
- Invert this stacked data back to its original form with the unstack Series method:
>>> college_stacked.unstack()
- A similar sequence of operations can be done with melt followed by pivot. First, read in the data without putting the institution name in the index:
>>> college2 = pd.read_csv('data/college.csv',
usecols=usecol_func)
>>> college2.head()
- Use the melt method to transpose all the race columns into a single column:
>>> college_melted = college2.melt(id_vars='INSTNM',
var_name='Race',
value_name='Percentage')
>>> college_melted.head()
- Use the pivot method to invert this previous result:
>>> melted_inv = college_melted.pivot(index='INSTNM',
columns='Race',
values='Percentage')
>>> melted_inv.head()
- Notice that the institution names are now shuttled over into the index and are not in their original order. The column names are not in their original order. To get an exact replication of our starting DataFrame from step 4, use the .loc indexing operator to select rows and columns simultaneously and then reset the index:
>>> college2_replication = melted_inv.loc[college2['INSTNM'],
college2.columns[1:]]
.reset_index()
>>> college2.equals(college2_replication)
True