Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

How it works...

How to do it...

Read in the flights dataset, and find the total number of flights between each origin and destination airport:

>>> flights = pd.read_csv('data/flights.csv')
>>> flights_ct = flights.groupby(['ORG_AIR', 'DEST_AIR']).size()
>>> flights_ct.head()
ORG_AIR  DEST_AIR
ATL      ABE         31
         ABQ         16
         ABY         19
         ACY          6
         AEX         40
dtype: int64

Select the total number of flights between Houston (IAH) and Atlanta (ATL) in both directions:

>>> flights_ct.loc[[('ATL', 'IAH'), ('IAH', 'ATL')]]
ORG_AIR  DEST_AIR
ATL      IAH         121
IAH      ATL         148
dtype: int64

We could simply sum these two numbers together to find the total flights between the cities but there is a more efficient and automated solution that can work for all flights. Let's independently sort the origin and destination cities for each row in alphabetical order:

>>> flights_sort = flights[['ORG_AIR', 'DEST_AIR']] 
                          .apply(sorted, axis=1)
>>> flights_sort.head()

Now that each row has been independently sorted, the column names are not correct. Let's rename them to something more generic and then again find the total number of flights between all cities:

>>> rename_dict = {'ORG_AIR':'AIR1', 'DEST_AIR':'AIR2'}
>>> flights_sort = flights_sort.rename(columns=rename_dict)
>>> flights_ct2 = flights_sort.groupby(['AIR1', 'AIR2']).size()
>>> flights_ct2.head()
AIR1  AIR2
ABE   ATL     31
      ORD     24
ABI   DFW     74
ABQ   ATL     16
      DEN     46
dtype: int64

Let's select all the flights between Atlanta and Houston and verify that it matches the sum of the values in step 2:

>>> flights_ct2.loc[('ATL', 'IAH')]
269

If we try and select flights with Houston followed by Atlanta, we get an error:

>>> flights_ct2.loc[('IAH', 'ATL')]
IndexingError: Too many indexers

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.