Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Grouping data

One typical workflow during data exploration looks as follows:

You find a criterion that you want to use to group your data. Maybe you have GDP data for every country along with the continent and you would like to ask questions about the continents. These questions usually lead to some function applications- you might want to compute the mean GDP per continent. Finally, you want to store this data for further processing in a new data structure.

We use a simpler example here. Imagine some fictional weather data about the number of sunny hours per day and city:

>>> df
          date    city  value
0   2000-01-03  London      6
1   2000-01-04  London      3
2   2000-01-05  London      4
3   2000-01-03  Mexico      3
4   2000-01-04  Mexico      9
5   2000-01-05  Mexico      8
6   2000-01-03  Mumbai     12
7   2000-01-04  Mumbai      9
8   2000-01-05  Mumbai      8
9   2000-01-03   Tokyo      5
10  2000-01-04   Tokyo      5
11  2000-01-05   Tokyo      6

The groups attributes return a dictionary containing the unique groups and the corresponding values as axis labels:

>>> df.groupby("city").groups
{'London': [0, 1, 2],
'Mexico': [3, 4, 5],
'Mumbai': [6, 7, 8],
'Tokyo': [9, 10, 11]}

Although the result of a groupby is a GroupBy object, not a DataFrame, we can use the usual indexing notation to refer to columns:

>>> grouped = df.groupby(["city", "value"])
>>> grouped["value"].max()
city
London     6
Mexico     9
Mumbai    12
Tokyo      6
Name: value, dtype: int64
>>> grouped["value"].sum()
city
London    13
Mexico    20
Mumbai    29
Tokyo     16
Name: value, dtype: int64

We see that, according to our data set, Mumbai seems to be a sunny city. An alternative – and more verbose – way to achieve the preceding code would be:
```
>>> df['value'].groupby(df['city']).sum()
city
London    13
Mexico    20
Mumbai    29
Tokyo     16
Name: value, dtype: int64
```

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Grouping data

Create new playlist

Sign In

Sign Up

Grouping data

Table of Contents for
Grouping data