How to do it...

Read in the flights dataset, and output the first five rows:

>>> flights = pd.read_csv('data/flights.csv')
>>> flights.head()

If we want to find the distribution of airlines over a range of distances, we need to place the values of the DIST column into discrete bins. Let's use the pandas cut function to split the data into five bins:

>>> bins = [-np.inf, 200, 500, 1000, 2000, np.inf]
>>> cuts = pd.cut(flights['DIST'], bins=bins)
>>> cuts.head()
0     (500.0, 1000.0]
1    (1000.0, 2000.0]
2     (500.0, 1000.0]
3    (1000.0, 2000.0]
4    (1000.0, 2000.0]
Name: DIST, dtype: category
Categories (5, interval[float64]): [(-inf, 200.0] < (200.0, 500.0] < (500.0, 1000.0] < (1000.0, 2000.0] < (2000.0, inf]]

An ordered categorical Series is created. To help get an idea of what happened, let's count the values of each category:

>>> cuts.value_counts()
(500.0, 1000.0]     20659
(200.0, 500.0]      15874
(1000.0, 2000.0]    14186
(2000.0, inf]        4054
(-inf, 200.0]        3719
Name: DIST, dtype: int64

The cuts Series can now be used to form groups. Pandas allows you to form groups in any way you wish. Pass the cuts Series to the groupby method and then call the value_counts method on the AIRLINE column to find the distribution for each distance group. Notice that SkyWest (OO) makes up 33% of flights less than 200 miles but only 16% of those between 200 and 500 miles:

>>> flights.groupby(cuts)['AIRLINE'].value_counts(normalize=True) 
                                    .round(3).head(15)
DIST            AIRLINE
(-inf, 200.0]   OO         0.326
                EV         0.289
                MQ         0.211
                DL         0.086
                AA         0.052
                UA         0.027
                WN         0.009
(200.0, 500.0]  WN         0.194
                DL         0.189
                OO         0.159
                EV         0.156
                MQ         0.100
                AA         0.071
                UA         0.062
                VX         0.028
Name: AIRLINE, dtype: float64

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...