There's more...

We can find more results when grouping by the cuts variable. For instance, we can find the 25th, 50th, and 75th percentile airtime for each distance grouping. As airtime is in minutes, we can divide by 60 to get hours:

>>> flights.groupby(cuts)['AIR_TIME'].quantile(q=[.25, .5, .75]) 
                                     .div(60).round(2)
DIST                  
(-inf, 200.0]     0.25    0.43
                  0.50    0.50
                  0.75    0.57
(200.0, 500.0]    0.25    0.77
                  0.50    0.92
                  0.75    1.05
(500.0, 1000.0]   0.25    1.43
                  0.50    1.65
                  0.75    1.92
(1000.0, 2000.0]  0.25    2.50
                  0.50    2.93
                  0.75    3.40
(2000.0, inf]     0.25    4.30
                  0.50    4.70
                  0.75    5.03
Name: AIR_TIME, dtype: float64

We can use this information to create informative string labels when using the cut function. These labels replace the interval notation. We can also chain the unstack method which transposes the inner index level to column names:

>>> labels=['Under an Hour', '1 Hour', '1-2 Hours',
            '2-4 Hours', '4+ Hours']
>>> cuts2 = pd.cut(flights['DIST'], bins=bins, labels=labels)
>>> flights.groupby(cuts2)['AIRLINE'].value_counts(normalize=True) 
                                     .round(3) 
                                     .unstack() 
                                     .style.highlight_max(axis=1)

Table of Contents for There's more...

Create new playlist

Sign In

Sign Up

Table of Contents for
There's more...