Using a box-and-whisker plot

A box-and-whisker plot is a good companion with the summary statistics to view the statistical summary of the data in hand. Box-and-whiskers can effectively represent quantiles in data and also outliers, if any, emphasizing the overall structure of the data. A box plot consists of the following features:

  • A horizontal line indicating the median that indicates the location of the data
  • A box spanning the interquartile range, measuring the dispersion
  • A set of whiskers that extends from the central box horizontally and vertically, which indicates the tail of the distribution

Getting ready

Let's use the box plot to look at the Iris dataset.

How to do it…

Let's load the necessary libraries to begin with. We will follow this with loading the Iris dataset:

# Load Libraries
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt

# Load Iris dataset
data = load_iris()
x = data['data']
plt.close('all')

Let's demonstrate how to create a box-and-whisker plot:

# Plot the box and whisker
fig = plt.figure(1)
ax = fig.add_subplot(111)
ax.boxplot(x)
ax.set_xticklabels(data['feature_names'])
plt.show()    

How it works…

The code is very straightforward. We will load the Iris data in x and pass the x values to the box plot function from pyplot. As you know, our x has four columns. The box plot is as follows:

How it works…

The box plot has captured both the location and variation of all the four columns in a single plot.

The horizontal red line indicates the median, which is the location of the data. You can see that the sepal length has a higher median than the rest of the columns.

The box spanning the interquartile range measuring the dispersion can be seen for all the four variables.

You can see a set of whiskers that extends from the central box horizontally and vertically, which indicates the tail of the distribution. Whiskers help you to see the extreme values in the datasets.

There's more…

It will also be interesting to see how the data is distributed across the various class labels. Similar to how we did in the scatter plots, let's do the same with the box-and-whisker plot. The following code and chart explains how to plot a box plot across various class labels:

y=data['target']
class_labels = data['target_names']

fig = plt.figure(2,figsize=(18,10))
sub_plt_count = 321
for t in range(0,3):
    ax = fig.add_subplot(sub_plt_count)
    y_index = np.where(y==t)[0]
    x_ = x[y_index,:]
    ax.boxplot(x_)
    ax.set_title(class_labels[t])   
    ax.set_xticklabels(data['feature_names'])
    sub_plt_count+=1
plt.show()

As you can see in the following chart, we now have a box-and-whisker plot for each class label:

There's more…
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset