How to do it...

  1. Read in the restaurant inspections dataset, and convert the Date column data type to datetime64:
>>> inspections = pd.read_csv('data/restaurant_inspections.csv',
parse_dates=['Date'])
>>> inspections.head()
  1. This dataset has two variables, Name and Date, that are each correctly contained in a single column. The Info column itself has five different variables: Borough, Cuisine, Description, Grade, and Score. Let's attempt to use the pivot method to keep the Name and Date columns vertical, create new columns out of all the values in the Info column, and use the Value column as their intersection:
>>> inspections.pivot(index=['Name', 'Date'],
columns='Info', values='Value')
NotImplementedError: > 1 ndim Categorical are not supported at this time
  1. Unfortunately, pandas developers have not implemented this functionality for us. There is a good chance that in the future, this line of code is going to work. Thankfully, for the most part, pandas has multiple ways of accomplishing the same task. Let's put Name, Date, and Info into the index:
>>> inspections.set_index(['Name','Date', 'Info']).head(10)
  1. Use the unstack method to pivot all the values in the Info column:
>>> inspections.set_index(['Name','Date', 'Info']) 
.unstack('Info').head()
  1. Make the index levels into columns with the reset_index method:
>>> insp_tidy = inspections.set_index(['Name','Date', 'Info']) 
.unstack('Info')
.reset_index(col_level=-1)
>>> insp_tidy.head()
  1. The dataset is tidy, but there is some annoying leftover pandas debris that needs to be removed. Let's use the MultiIndex method droplevel to remove the top column level and then rename the index level to None:
>>> insp_tidy.columns = insp_tidy.columns.droplevel(0) 
.rename(None)
>>> insp_tidy.head()
  1. The creation of the column MultiIndex in step 4 could have been avoided by converting that one column DataFrame into a Series with the squeeze method. The following code produces the same result as the previous step:
>>> inspections.set_index(['Name','Date', 'Info']) 
.squeeze()
.unstack('Info')
.reset_index()
.rename_axis(None, axis='columns')
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset