How to do it...

  1. Let's get started by grouping the state and religious affiliation columns from the college dataset, saving the result to a variable and confirming its type:
>>> college = pd.read_csv('data/college.csv')
>>> grouped = college.groupby(['STABBR', 'RELAFFIL'])
>>> type(grouped)
pandas.core.groupby.DataFrameGroupBy
  1. Use the dir function to discover all its available functionality:
>>> print([attr for attr in dir(grouped) if not attr.startswith('_')])
['CITY', 'CURROPER', 'DISTANCEONLY', 'GRAD_DEBT_MDN_SUPP', 'HBCU', 'INSTNM', 'MD_EARN_WNE_P10', 'MENONLY', 'PCTFLOAN', 'PCTPELL', 'PPTUG_EF', 'RELAFFIL', 'SATMTMID', 'SATVRMID', 'STABBR', 'UG25ABV', 'UGDS', 'UGDS_2MOR', 'UGDS_AIAN', 'UGDS_ASIAN', 'UGDS_BLACK', 'UGDS_HISP', 'UGDS_NHPI', 'UGDS_NRA', 'UGDS_UNKN', 'UGDS_WHITE', 'WOMENONLY', 'agg', 'aggregate', 'all', 'any', 'apply', 'backfill', 'bfill', 'boxplot', 'corr', 'corrwith', 'count', 'cov', 'cumcount', 'cummax', 'cummin', 'cumprod', 'cumsum', 'describe', 'diff', 'dtypes', 'expanding', 'ffill', 'fillna', 'filter', 'first', 'get_group', 'groups', 'head', 'hist', 'idxmax', 'idxmin', 'indices', 'last', 'mad', 'max', 'mean', 'median', 'min', 'ndim', 'ngroup', 'ngroups', 'nth', 'nunique', 'ohlc', 'pad', 'pct_change', 'plot', 'prod', 'quantile', 'rank', 'resample', 'rolling', 'sem', 'shift', 'size', 'skew', 'std', 'sum', 'tail', 'take', 'transform', 'tshift', 'var']
  1. Find the number of groups with the ngroups attribute:
>>> grouped.ngroups
112
  1. To find the uniquely identifying labels for each group, look in the groups attribute, which contains a dictionary of each unique group mapped to all the corresponding index labels of that group:
>>> groups = list(grouped.groups.keys())
>>> groups[:6]
[('AK', 0), ('AK', 1), ('AL', 0), ('AL', 1), ('AR', 0), ('AR', 1)]
  1. Retrieve a single group with the get_group method by passing it a tuple of an exact group label. For example, to get all the religiously affiliated schools in the state of Florida, do the following:
>>> grouped.get_group(('FL', 1)).head()
  1. You may want to take a peek at each individual group. This is possible because groupby objects are iterable:
>>> from IPython.display import display
>>> for name, group in grouped:
print(name)
display(group.head(3))
  1. You can also call the head method on your groupby object to get the first rows of each group together in a single DataFrame.
>>> grouped.head(2).head(6)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset