How it works...

There are multiple ways to accomplish the same thing in step 1. Here, we show the versatility of the read_csv function. The usecols parameter accepts either a list of the columns that we would like to import or a function that dynamically determines them. We use an anonymous function that checks whether the column name contains UGDS_ or is equal to INSTNM. The function is passed each column name as a string and must return a boolean. A huge amount of memory can be saved in this manner.

The stack method in step 2 puts all column names into the innermost index level and returns a Series. In step 3, the unstack method inverts this operation by taking all the values in the innermost index level converting them to column names.

The result from step 3 isn't quite an exact replication of step 1. There are entire rows of missing values, and by default, the stack method drops these during step 2. To keep these missing values and create an exact replication, use dropna=False in the stack method.

Step 4 reads in the same dataset as in step 1 but does not put the institution name in the index because the melt method isn't able to access it. Step 5 uses the melt method to transpose all the Race columns. It does this by leaving the value_vars parameter as its default value None. When not specified, all the columns not present in the id_vars parameter get transposed.

Step 6 inverts the operation from step 5 with the pivot method, which accepts three parameters. Each parameter takes a single column as a string. The column referenced by the index parameter remains vertical and becomes the new index. The values of the column referenced by the columns parameter become the column names. The values referenced by the values parameter become tiled to correspond with the intersection of their former index and columns label.

To make an exact replication with pivot, we need to sort the rows and columns in the exact order from the original. As the institution name is in the index, we use the .loc indexing operator as a way to sort the DataFrame by its original index.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset