One typical workflow during data exploration looks as follows:
>>> df date city value 0 2000-01-03 London 6 1 2000-01-04 London 3 2 2000-01-05 London 4 3 2000-01-03 Mexico 3 4 2000-01-04 Mexico 9 5 2000-01-05 Mexico 8 6 2000-01-03 Mumbai 12 7 2000-01-04 Mumbai 9 8 2000-01-05 Mumbai 8 9 2000-01-03 Tokyo 5 10 2000-01-04 Tokyo 5 11 2000-01-05 Tokyo 6
The groups
attributes return a dictionary containing the unique groups and the corresponding values as axis labels:
>>> df.groupby("city").groups {'London': [0, 1, 2], 'Mexico': [3, 4, 5], 'Mumbai': [6, 7, 8], 'Tokyo': [9, 10, 11]}
groupby
is a GroupBy object, not a DataFrame, we can use the usual indexing notation to refer to columns:>>> grouped = df.groupby(["city", "value"]) >>> grouped["value"].max() city London 6 Mexico 9 Mumbai 12 Tokyo 6 Name: value, dtype: int64 >>> grouped["value"].sum() city London 13 Mexico 20 Mumbai 29 Tokyo 16 Name: value, dtype: int64
>>> df['value'].groupby(df['city']).sum() city London 13 Mexico 20 Mumbai 29 Tokyo 16 Name: value, dtype: int64