Generating a frequency table of two variables

Creating a frequency table for a single column is good; creating a frequency table for two is even better. That's what we'll do in this recipe.

Getting ready

As with the previous recipes where we retrieve data from MongoDB, you need to have MongoDB running and to have imported the accidents dataset.

How to do it…

  1. To generate a frequency table for two variables, begin by import the required libraries as follows:
    import pandas as pd
    import numpy as np
    from pymongo import MongoClient
  2. Next, connect to MongoDB and run a query specifying the five fields to be retrieved from the MongoDB data:
    client = MongoClient('localhost', 27017)
    db = client.pythonbicookbook
    collection = db.accidents
    fields = {'Date':1,
              'Police_Force':1,
              'Accident_Severity':1,
              'Number_of_Vehicles':1,
              'Number_of_Casualties':1}
    data = collection.find({}, fields)
  3. Next, create a DataFrame from the results of the query:
    accidents = pd.DataFrame(list(data))
  4. After that, use the groupby() and agg() methods of the DataFrame to get the casualty counts:
    casualty_count = accidents.groupby('Date').agg({'Number_of_Casualties': np.sum})
  5. Next, create a second DataFrame for the vehicle counts using the same methods:
    vehicle_count = accidents.groupby('Date').agg({'Number_of_Vehicles': np.sum})
  6. Finally, merge the two DataFrames together using their indices and show the results:
    casualties_and_vehicles = casualty_count.merge(vehicle_count, left_index=True, right_index=True)
    casualties_and_vehicles

How it works…

This recipe builds on the previous recipes that we've seen, including Generating a frequency table for a single column by date and Merging two datasets in Pandas. The code is exactly the same as the previous recipe; however, here we create a second DataFrame using groupby() and agg() to aggregate the vehicle counts by date.

Once we have our second DataFrame, we merge it with the first and have a two-column frequency table showing the counts of the number of casualties and number of vehicles over a period of time.

This quick visualization shows us the casualty and vehicle trends over time:

How it works…
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset