Creating a customized box plot with whiskers

Box plots help to identify the outliers in data, and are useful for comparing distributions. As per Wikipedia, Box and whisker plots are uniform in their use of the box: the bottom and top of the box are always the first and third quartiles, and the band inside the box is always the second quartile (the median).

The lines extending from the box are the whiskers. Any data not included between the whiskers is an outlier.

How to do it…

  1. To create a customized box plot with whiskers, begin by importing all the required libraries. To show the matplotlib plots in IPython Notebook, we will use an IPython magic function which starts with %:
    %matplotlib inline
    import pandas as pd
    import numpy as np
    from pymongo import MongoClient
    import matplotlib as mpl
    import matplotlib.pyplot as plt
  2. Next, connect to MongoDB and run a query specifying the five fields to be retrieved from the MongoDB data:
    client = MongoClient('localhost', 27017)
    db = client.pythonbicookbook
    collection = db.accidents
    fields = {'Date':1,
              'Police_Force':1,
              'Accident_Severity':1,
              'Number_of_Vehicles':1,
              'Number_of_Casualties':1}
    data = collection.find({}, fields)
  3. Next, create a DataFrame from the results of the query:
    accidents = pd.DataFrame(list(data))
  4. After that, create frequency tables for casualty and vehicle counts:
    casualty_count = accidents.groupby('Date').agg({'Number_of_Casualties': np.sum})
    vehicle_count = accidents.groupby('Date').agg({'Number_of_Vehicles': np.sum})
  5. Next, create an array from the two frequency tables:
    data_to_plot = [casualty_count['Number_of_Casualties'],
                    vehicle_count['Number_of_Vehicles']]
  6. Next, create a figure instance and specify its size:
    fig = plt.figure(1, figsize=(9, 6))
  7. After that, create an axis instance.
    ax = fig.add_subplot(111)
  8. Next, create the boxplot:
    bp = ax.boxplot(data_to_plot)
  9. Customize the color and linewidth of the caps as follows:
    for cap in bp['caps']:
        cap.set(color='#7570b3', linewidth=2)
  10. Change the color and linewidth of the medians:
    for median in bp['medians']:
        median.set(color='#b2df8a', linewidth=2)
  11. Change the style of the fliers and their fill:
    for flier in bp['fliers']:
        flier.set(marker='o', color='#e7298a', alpha=0.5)
  12. Add the x axis labels:
    ax.set_xticklabels(['Casualties', 'Vehicles'])
  13. Finally, render the figure inline.
    fig.savefig('fig1.png', bbox_inches='tight')

How it works…

First, we import all the required Python libraries, and then connect to MongoDB. After this, we run a query against MongoDB, and create a new DataFrame from the result:

# Create a frequency table of casualty counts from the previous recipe
casualty_count = accidents.groupby('Date').agg({'Number_of_Casualties': np.sum})
# Create a frequency table of vehicle counts
vehicle_count = accidents.groupby('Date').agg({'Number_of_Vehicles': np.sum})

Next, we create frequency tables for casualty and vehicle counts:

# Create an array from the two frequency tables
data_to_plot = [casualty_count['Number_of_Casualties'],
                vehicle_count['Number_of_Vehicles']]

After that we create an array from the two frequency tables:

fig = plt.figure(1, figsize=(9, 6))

Next we create an instance of a figure. The figure will be displayed when all is said and done; it is the chart that will be rendered in our IPython Notebook:

ax = fig.add_subplot(111)

After that, we add an axis instance to our figure. An axis is exactly what you might guess it is—the place for data points:

bp = ax.boxplot(data_to_plot)

The preceding line of code creates the boxplot using the data:

for cap in bp['caps']:
    cap.set(color='#7570b3', linewidth=2)

Here, we change the color and line width of the caps. The caps are the ends of the whiskers:

for median in bp['medians']:
    median.set(color='#b2df8a', linewidth=2)

The preceding code changes the color and linewidth of the medians. The medians divide the box in half; they allow the data to be split into quarters:

for flier in bp['fliers']:
    flier.set(marker='o', color='#e7298a', alpha=0.5)

Here we change the style of the fliers and their fill. The fliers are the outliers in the data, the data plotted past the whiskers.

ax.set_xticklabels(['Casualties', 'Vehicles'])

This preceding line of code puts labels on the x-axis.

fig.savefig('fig1.png', bbox_inches='tight')

Finally, we show the figure by saving it. This saves the figure as a PNG in our working directory (the same one the IPython Notebook is in) as well as displays it in the IPython Notebook:

How it works…

You did it! This is definitely the most complex plot we've created yet.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset