How it works...

The syntax for the groupby method is not as straightforward as other methods. Let's intercept the chain of methods in step 2 by storing the result of the groupby method as its own variable

>>> grouped = flights.groupby('AIRLINE')
>>> type(grouped)
pandas.core.groupby.DataFrameGroupBy

A completely new intermediate object is first produced with its own distinct attributes and methods. No calculations take place at this stage. Pandas merely validates the grouping columns. This groupby object has an agg method to perform aggregations. One of the ways to use this method is to pass it a dictionary mapping the aggregating column to the aggregating function, as done in step 2.

There are several different flavors of syntax that produce a similar result, with step 3 showing an alternative. Instead of identifying the aggregating column in the dictionary, place it inside the indexing operator just as if you were selecting it as a column from a DataFrame. The function string name is then passed as a scalar to the agg method.

You may pass any aggregating function to the agg method. Pandas allows you to use the string names for simplicity but you may also explicitly call an aggregating function as done in step 4. NumPy provides many functions that aggregate values.

Step 5 shows one last syntax flavor. When you are only applying a single aggregating function as in this example, you can often call it directly as a method on the groupby object itself without agg. Not all aggregation functions have a method equivalent but many basic ones do. The following is a list of several aggregating functions that may be passed as a string to agg or chained directly as a method to the groupby object:

min     max    mean    median    sum    count    std    var   
size describe nunique idxmin idxmax
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset