- Read in the flights dataset, and find the total number of flights between each origin and destination airport:
>>> flights = pd.read_csv('data/flights.csv')
>>> flights_ct = flights.groupby(['ORG_AIR', 'DEST_AIR']).size()
>>> flights_ct.head()
ORG_AIR DEST_AIR
ATL ABE 31
ABQ 16
ABY 19
ACY 6
AEX 40
dtype: int64
- Select the total number of flights between Houston (IAH) and Atlanta (ATL) in both directions:
>>> flights_ct.loc[[('ATL', 'IAH'), ('IAH', 'ATL')]]
ORG_AIR DEST_AIR
ATL IAH 121
IAH ATL 148
dtype: int64
- We could simply sum these two numbers together to find the total flights between the cities but there is a more efficient and automated solution that can work for all flights. Let's independently sort the origin and destination cities for each row in alphabetical order:
>>> flights_sort = flights[['ORG_AIR', 'DEST_AIR']]
.apply(sorted, axis=1)
>>> flights_sort.head()
- Now that each row has been independently sorted, the column names are not correct. Let's rename them to something more generic and then again find the total number of flights between all cities:
>>> rename_dict = {'ORG_AIR':'AIR1', 'DEST_AIR':'AIR2'}
>>> flights_sort = flights_sort.rename(columns=rename_dict)
>>> flights_ct2 = flights_sort.groupby(['AIR1', 'AIR2']).size()
>>> flights_ct2.head()
AIR1 AIR2
ABE ATL 31
ORD 24
ABI DFW 74
ABQ ATL 16
DEN 46
dtype: int64
- Let's select all the flights between Atlanta and Houston and verify that it matches the sum of the values in step 2:
>>> flights_ct2.loc[('ATL', 'IAH')]
269
- If we try and select flights with Houston followed by Atlanta, we get an error:
>>> flights_ct2.loc[('IAH', 'ATL')]
IndexingError: Too many indexers