Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Generating a frequency table for a single column by date

A frequency table is another way of summarizing data; it shows the number of times a value occurs. In this recipe, we will create a frequency table of casualties by date.

Getting ready

To use this recipe, you need to have MongoDB running, and to have the accidents data imported.

How to do it…

To generate a frequency table for a single column by date, begin by importing the Python libraries that we need:
```
import pandas as pd
import numpy as np
from pymongo import MongoClient
```

Next, connect to MongoDB, and run a query specifying the five fields to be retrieved from the MongoDB data:

client = MongoClient('localhost', 27017)
db = client.pythonbicookbook
collection = db.accidents
fields = {'Date':1,
          'Police_Force':1,
          'Accident_Severity':1,
          'Number_of_Vehicles':1,
          'Number_of_Casualties':1}
data = collection.find({}, fields)

Next, create a DataFrame from the results of the query:
```
accidents = pd.DataFrame(list(data))
```

Finally, use the groupby() and agg() methods of the DataFrame and show the results:

casualty_count = accidents.groupby('Date').agg({'Number_of_Casualties': np.sum})
casualty_count

How it works…

After importing the Pandas, numpy, and pymongo libraries, we create a connection to MongoDB, selecting the accidents collection from the Python Business Intelligence Cookbook database:

fields = {'Date':1,
          'Police_Force':1,
          'Accident_Severity':1,
          'Number_of_Vehicles':1,

Next we specify the five fields we want to use in our query. We are specifying fields as we do not need to retrieve all of the data in the collection in order to produce the frequency table. This makes our query run a lot faster:

data = collection.find({}, fields)
accidents = pd.DataFrame(list(data))

After that, we run our query and put the results into the data variable. We then create a pandas DataFrame from our data:

casualty_count = accidents.groupby('Date').agg({'Number_of_Casualties': np.sum})
casualty_count

Finally, we create our frequency table using the groupby() and agg() functions of Pandas. groupby(), just as in SQL, allows us to group the results by a specified field, in this case 'Date'. agg() computes the summary statistics that you provide. In this example, we are telling agg() to use the numpy sum function. The results look as shown in the following image:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Generating a frequency table for a single column by date

Create new playlist

Sign In

Sign Up

Generating a frequency table for a single column by date

Getting ready

How to do it…

How it works…

Table of Contents for
Generating a frequency table for a single column by date