Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

How it works...

How to do it...

Read in the college dataset, create a separate DataFrame with STABBR as the index, and check whether the index is sorted:

>>> college = pd.read_csv('data/college.csv')
>>> college2 = college.set_index('STABBR')
>>> college2.index.is_monotonic
False

Sort the index from college2 and store it as another object:

>>> college3 = college2.sort_index()
>>> college3.index.is_monotonic
True

Time the selection of the state of Texas (TX) from all three DataFrames:

>>> %timeit college[college['STABBR'] == 'TX']
1.43 ms ± 53.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

>>> %timeit college2.loc['TX']
526 µs ± 6.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

>>> %timeit college3.loc['TX']
183 µs ± 3.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

The sorted index performs nearly an order of magnitude faster than boolean selection. Let's now turn towards unique indexes. For this, we use the institution name as the index:

>>> college_unique = college.set_index('INSTNM')
>>> college_unique.index.is_unique
True

Let's select Stanford University with boolean indexing:

>>> college[college['INSTNM'] == 'Stanford University']

Let's select Stanford University with index selection:

>>> college_unique.loc['Stanford University']
CITY                  Stanford
STABBR                      CA
HBCU                         0
...
UG25ABV                 0.0401
MD_EARN_WNE_P10          86000
GRAD_DEBT_MDN_SUPP       12782
Name: Stanford University, dtype: object

They both produce the same data, just with different objects. Let's time each approach:

>>> %timeit college[college['INSTNM'] == 'Stanford University']
1.3 ms ± 56.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

>>> %timeit college_unique.loc['Stanford University']
157 µs ± 682 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.