How to do it...

Read in the college dataset with the institution name as the index, and with only the undergraduate race columns:

>>> usecol_func = lambda x: 'UGDS_' in x or x == 'INSTNM'
>>> college = pd.read_csv('data/college.csv', 
                          index_col='INSTNM', 
                          usecols=usecol_func)
>>> college.head()

Use the stack method to convert each horizontal column name into a vertical index level:

>>> college_stacked = college.stack()
>>> college_stacked.head(18)
INSTNM                                         
Alabama A &amp; M University         UGDS_WHITE    0.0333
                                     UGDS_BLACK    0.9353
                                     UGDS_HISP     0.0055
                                     UGDS_ASIAN    0.0019
                                     UGDS_AIAN     0.0024
                                     UGDS_NHPI     0.0019
                                     UGDS_2MOR     0.0000
                                     UGDS_NRA      0.0059
                                     UGDS_UNKN     0.0138
University of Alabama at Birmingham  UGDS_WHITE    0.5922
                                     UGDS_BLACK    0.2600
                                     UGDS_HISP     0.0283
                                     UGDS_ASIAN    0.0518
                                     UGDS_AIAN     0.0022
                                     UGDS_NHPI     0.0007
                                     UGDS_2MOR     0.0368
                                     UGDS_NRA      0.0179
                                     UGDS_UNKN     0.0100
dtype: float64

Invert this stacked data back to its original form with the unstack Series method:

>>> college_stacked.unstack()

A similar sequence of operations can be done with melt followed by pivot. First, read in the data without putting the institution name in the index:

>>> college2 = pd.read_csv('data/college.csv', 
                          usecols=usecol_func)
>>> college2.head()

Use the melt method to transpose all the race columns into a single column:

>>> college_melted = college2.melt(id_vars='INSTNM', 
                                   var_name='Race',
                                   value_name='Percentage')
>>> college_melted.head()

Use the pivot method to invert this previous result:

>>> melted_inv = college_melted.pivot(index='INSTNM', 
                                      columns='Race',
                                      values='Percentage')
>>> melted_inv.head()

Notice that the institution names are now shuttled over into the index and are not in their original order. The column names are not in their original order. To get an exact replication of our starting DataFrame from step 4, use the .loc indexing operator to select rows and columns simultaneously and then reset the index:

>>> college2_replication = melted_inv.loc[college2['INSTNM'],
                                          college2.columns[1:]]
                                     .reset_index()
>>> college2.equals(college2_replication)
True

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...