A box-and-whisker plot is a good companion with the summary statistics to view the statistical summary of the data in hand. Box-and-whiskers can effectively represent quantiles in data and also outliers, if any, emphasizing the overall structure of the data. A box plot consists of the following features:
Let's load the necessary libraries to begin with. We will follow this with loading the Iris dataset:
# Load Libraries from sklearn.datasets import load_iris import matplotlib.pyplot as plt # Load Iris dataset data = load_iris() x = data['data'] plt.close('all')
Let's demonstrate how to create a box-and-whisker plot:
# Plot the box and whisker fig = plt.figure(1) ax = fig.add_subplot(111) ax.boxplot(x) ax.set_xticklabels(data['feature_names']) plt.show()
The code is very straightforward. We will load the Iris data in x and pass the x values to the box plot function from pyplot. As you know, our x has four columns. The box plot is as follows:
The box plot has captured both the location and variation of all the four columns in a single plot.
The horizontal red line indicates the median, which is the location of the data. You can see that the sepal length has a higher median than the rest of the columns.
The box spanning the interquartile range measuring the dispersion can be seen for all the four variables.
You can see a set of whiskers that extends from the central box horizontally and vertically, which indicates the tail of the distribution. Whiskers help you to see the extreme values in the datasets.
It will also be interesting to see how the data is distributed across the various class labels. Similar to how we did in the scatter plots, let's do the same with the box-and-whisker plot. The following code and chart explains how to plot a box plot across various class labels:
y=data['target'] class_labels = data['target_names'] fig = plt.figure(2,figsize=(18,10)) sub_plt_count = 321 for t in range(0,3): ax = fig.add_subplot(sub_plt_count) y_index = np.where(y==t)[0] x_ = x[y_index,:] ax.boxplot(x_) ax.set_title(class_labels[t]) ax.set_xticklabels(data['feature_names']) sub_plt_count+=1 plt.show()
As you can see in the following chart, we now have a box-and-whisker plot for each class label: