A frequency table is another way of summarizing data; it shows the number of times a value occurs. In this recipe, we will create a frequency table of casualties by date.
To use this recipe, you need to have MongoDB running, and to have the accidents data imported.
import pandas as pd import numpy as np from pymongo import MongoClient
client = MongoClient('localhost', 27017) db = client.pythonbicookbook collection = db.accidents fields = {'Date':1, 'Police_Force':1, 'Accident_Severity':1, 'Number_of_Vehicles':1, 'Number_of_Casualties':1} data = collection.find({}, fields)
accidents = pd.DataFrame(list(data))
groupby()
and agg()
methods of the DataFrame and show the results:casualty_count = accidents.groupby('Date').agg({'Number_of_Casualties': np.sum}) casualty_count
After importing the Pandas, numpy
, and pymongo
libraries, we create a connection to MongoDB, selecting the accidents
collection from the Python Business Intelligence Cookbook database:
fields = {'Date':1, 'Police_Force':1, 'Accident_Severity':1, 'Number_of_Vehicles':1,
Next we specify the five fields we want to use in our query. We are specifying fields as we do not need to retrieve all of the data in the collection in order to produce the frequency table. This makes our query run a lot faster:
data = collection.find({}, fields) accidents = pd.DataFrame(list(data))
After that, we run our query and put the results into the data variable. We then create a pandas DataFrame from our data:
casualty_count = accidents.groupby('Date').agg({'Number_of_Casualties': np.sum}) casualty_count
Finally, we create our frequency table using the groupby()
and agg()
functions of Pandas. groupby()
, just as in SQL, allows us to group the results by a specified field, in this case 'Date'. agg()
computes the summary statistics that you provide. In this example, we are telling agg()
to use the numpy
sum function. The results look as shown in the following image: