Plotting our data in a histogram as a probability distribution tells matplotlib
to integrate the total area of the histogram, and scale the values appropriately. Rather than showing how many values go into each bin as in the previous recipe, we'll have the probability of finding a number in the bin.
matplotlib
plots in IPython Notebook, we will use an IPython magic function which starts with %
:%matplotlib inline import pandas as pd import numpy as np from pymongo import MongoClient import matplotlib as mpl import matplotlib.pyplot as plt
client = MongoClient('localhost', 27017) db = client.pythonbicookbook collection = db.accidents fields = {'Date':1, 'Police_Force':1, 'Accident_Severity':1, 'Number_of_Vehicles':1, 'Number_of_Casualties':1} data = collection.find({}, fields)
accidents = pd.DataFrame(list(data))
plt.hist(casualty_count['Number_of_Casualties'], bins=30, normed=True) plt.title('Probability Distribution') plt.xlabel('Value') plt.ylabel('Probability') plt.show()
This recipe works exactly like the previous recipe with the exception of the way we create the histogram:
plt.hist(casualty_count['Number_of_Casualties'], bins=30, normed=True)
With the addition of normed=True
, we turn the histogram into a probability distribution, and see the following plot as a result: