Making box plots to show the interquartile ranges and the outliers

We will begin by importing the data. Start by generating normal Gaussian distributions with a couple of different properties, as follows:

# Generate some Normal distributions with different properties
rands1 = np.random.normal(size=500)
rands2 = np.random.normal(scale=2, size=500)
rands3 = np.random.normal(loc=1, scale=0.5, size=500)
gaussians = (rands1, rands2, rands3)
  1. Make some box plots out of this data. Hence, by making a box plot of Gaussians, we can comment to suppress the output. Here, we can see that we get the following plot:
# Basic Boxplot
plt.boxplot(gaussians);

Following is the output of the preceding code:

This kind of box plot was invented at Bell Labs about fifty years ago. Each of the boxes shows the interquartile range around the mean of the values; the black edges show the 75th and 25th percentile, and each of the little dots (plus signs), known as Flyers, show the outliers within the dataset. From this simple plot, we can automatically see that the first two distributions have the same mean but a different standard deviation and a different interquartile range, while the third box has a different mean and a different interquartile range.

  1. We can also add labels if all three boxes aren't very descriptive. We do this by using a tuple of strings. By adding the keywords first, second, and third, we get the following:
# Labels
plt.boxplot(gaussians, labels=("first", "second", "third"));

We will get the following output:

By default, this will usually give you decent width values. This will automatically scale things. However, we can also make them smaller or larger.

  1. Making a smaller width can be done as follows:
# Box widths
plt.boxplot(gaussians, widths=0.1);

Following is the output of the preceding code:

  1. Making a larger width can be done as follows by changing the width to 0.7:
  1. We can also choose to pass a tuple for each of these values, depending on what the data will try to show. For example, it could show one thin, one wide, and one very wide boxplot, as shown in the following code:
# Box widths
plt.boxplot(gaussians, widths=(0.1,0.5,0.7));

The preceding code gives the following output:

  1. We can also set these boxes to be horizontal when we set vert= 'False'. We can also get horizontal box plots and customize the appearance of the outliers like so:
# Horizontal boxes w/ vert
plt.boxplot(gaussians, vert=False);

Following is the output of the preceding code:

  1. To customize the appearance of the outliers, we could use sym = '.' and change these outliers to use dots, as follows:
  1. We can also use an empty string sym='':
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset