Unfortunately, pandas does not have a direct way to use these additional arguments when using multiple aggregation functions together. For example, if you wish to aggregate using the pct_between and mean functions, you will get the following exception:
>>> college.groupby(['STABBR', 'RELAFFIL'])['UGDS']
.agg(['mean', pct_between], low=100, high=1000)
TypeError: pct_between() missing 2 required positional arguments: 'low' and 'high'
Pandas is incapable of understanding that the extra arguments need to be passed to pct_between. In order to use our custom function with other built-in functions and even other custom functions, we can define a special type of nested function called a closure. We can use a generic closure to build all of our customized functions:
>>> def make_agg_func(func, name, *args, **kwargs):
def wrapper(x):
return func(x, *args, **kwargs)
wrapper.__name__ = name
return wrapper
>>> my_agg1 = make_agg_func(pct_between, 'pct_1_3k', low=1000, high=3000)
>>> my_agg2 = make_agg_func(pct_between, 'pct_10_30k', 10000, 30000)
>>> college.groupby(['STABBR', 'RELAFFIL'])['UGDS']
.agg(['mean', my_agg1, my_agg2]).head()
The make_agg_func function acts as a factory to create customized aggregation functions. It accepts the customized aggregation function that you already built (pct_between in this case), a name argument, and an arbitrary number of extra arguments. It returns a function with the extra arguments already set. For instance, my_agg1 is a specific customized aggregating function that finds the percentage of schools with an undergraduate population between one and three thousand. The extra arguments (*args and **kwargs) specify an exact set of parameters for your customized function (pct_between in this case). The name parameter is very important and must be unique each time make_agg_func is called. It will eventually be used to rename the aggregated column.