Box plot and violin plot

The way a strip plot and swarm plot represent data makes comparison difficult. Suppose you want to find out whether the stable or constrictive population type has a higher median BigMac index value. Can you do that based on the two previous example plots?

You might be tempted to think that the constrictive group has a higher median value because of the higher maximum data point, but in fact, the stable group has a higher median value.

Could there be a better plot type for comparing the distribution of categorical data? Here you go! Let's try a box plot:

# Box plot
ax = sns.boxplot(x="population type", y="dollar_price", data=merged_df2)
ax.set_xlabel("Population type")
ax.set_ylabel("BigMac index (US$)")

plt.show()

The expected output:

The box represents quartiles of the data, the center line denotes the median value, and the whiskers represent the full range of the data. Data points that deviate by more than 1.5 times the interquartile range from the upper or lower quartile are deemed to be outliers and show as fliers.

A violin plot combines the kernel density estimate of our data with the box plot. Both box plot and violin plot display the median and interquartile range, but a violin plot goes one step further by showing the full estimated probability distribution that is fit to the data. Therefore, we can tell whether there are peaks within the data and also compare their relative amplitude.

If we change the Seaborn function call from sns.boxplot to sns.violinplot in the code excerpt, the result would be like this:

We can also overlay a strip plot or swarm plot on top of the box plot or swarm plot in order to get the best of both worlds. Here is an example code:

# Prepare a box plot
ax = sns.boxplot(x="population type", y="dollar_price", data=merged_df2)

# Overlay a swarm plot on top of the same axes
sns.swarmplot(x="population type", y="dollar_price", data=merged_df2, color="w", ax=ax)
ax.set_xlabel("Population type")
ax.set_ylabel("BigMac index (US$)")

plt.show()

The expected output:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset