How to do it...

  1. Read in the college dataset and use boolean indexing to select all institutions from the state of Texas (TX):
>>> college = pd.read_csv('data/college.csv')
>>> college[college['STABBR'] == 'TX'].head()

Pandas official documentation on

  1. To replicate this using index selection, we need to move the STABBR column into the index. We can then use label-based selection with the .loc indexer:
>>> college2 = college.set_index('STABBR')
>>> college2.loc['TX'].head()
  1. Let's compare the speed of both methods:
>>> %timeit college[college['STABBR'] == 'TX']
1.43 ms ± 53.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

>>> %timeit college2.loc['TX']
526 µs ± 6.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
  1. Boolean indexing takes three times as long as index selection. As setting the index does not come for free, let's time that operation as well:
>>> %timeit college2 = college.set_index('STABBR')
1.04 ms ± 5.37 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset