There's more...

We can find more results when grouping by the cuts variable. For instance, we can find the 25th, 50th, and 75th percentile airtime for each distance grouping. As airtime is in minutes, we can divide by 60 to get hours:

>>> flights.groupby(cuts)['AIR_TIME'].quantile(q=[.25, .5, .75]) 
.div(60).round(2)
DIST (-inf, 200.0] 0.25 0.43 0.50 0.50 0.75 0.57 (200.0, 500.0] 0.25 0.77 0.50 0.92 0.75 1.05 (500.0, 1000.0] 0.25 1.43 0.50 1.65 0.75 1.92 (1000.0, 2000.0] 0.25 2.50 0.50 2.93 0.75 3.40 (2000.0, inf] 0.25 4.30 0.50 4.70 0.75 5.03 Name: AIR_TIME, dtype: float64

We can use this information to create informative string labels when using the cut function. These labels replace the interval notation. We can also chain the unstack method which transposes the inner index level to column names:

>>> labels=['Under an Hour', '1 Hour', '1-2 Hours',
'2-4 Hours', '4+ Hours']
>>> cuts2 = pd.cut(flights['DIST'], bins=bins, labels=labels)
>>> flights.groupby(cuts2)['AIRLINE'].value_counts(normalize=True)
.round(3)
.unstack()
.style.highlight_max(axis=1)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset