There's more...

Now that we have found the longest streaks of on-time arrivals, we can easily find the opposite--the longest streak of delayed arrivals. The following function returns two rows for each group passed to it. The first row is the start of the streak, and the last row is the end of the streak. Each row contains the month and day that the streak started/ended, along with the total streak length:

>>> def max_delay_streak(df):
df = df.reset_index(drop=True)
s = 1 - df['ON_TIME']
s1 = s.cumsum()
streak = s.mul(s1).diff().where(lambda x: x < 0)
.ffill().add(s1, fill_value=0)
last_idx = streak.idxmax()
first_idx = last_idx - streak.max() + 1
df_return = df.loc[[first_idx, last_idx], ['MONTH', 'DAY']]
df_return['streak'] = streak.max()
df_return.index = ['first', 'last']
df_return.index.name='type'
return df_return

>>> flights.sort_values(['MONTH', 'DAY', 'SCHED_DEP'])
.groupby(['AIRLINE', 'ORG_AIR'])
.apply(max_delay_streak)
.sort_values('streak', ascending=False).head(10)

As we are using the apply groupby method, a DataFrame of each group is passed to the max_delay_streak function. Inside this function, the index of the DataFrame is dropped and replaced by a RangeIndex in order for us to easily find the first and last row of the streak. The ON_TIME column is inverted and then the same logic is used to find streaks of delayed flights. The index of the first and last rows of the streak are stored as variables. These indexes are then used to select the month and day when the streaks ended. We use a DataFrame to return our results. We label and name the index to make the final result clearer.

Our final results show the longest delayed streaks accompanied by the first and last date. Let's investigate to see if we can find out why these delays happened. Inclement weather is a common reason for delayed or canceled flights. Looking at the first row, American Airlines (AA) started a streak of 38 delayed flights in a row from the Dallas Fort-Worth (DFW) airport beginning February 26 until March 1 of 2015. Looking at historical weather data from February 27, 2015, two inches of snow fell, which was a record for that day (http://bit.ly/2iLGsCg). This was a major weather event for DFW and caused massive problems for the entire city (http://bit.ly/2wmsHPj). Notice that DFW makes another appearance as the third longest streak but this time a few days earlier and for a different airline.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset