Histogram and distribution fitting in Seaborn

In the population example, the raw data was already binned into different age groups. What if the data is not binned (for example, the BigMac Index data)? Turns out, seaborn.distplot can help us to process the data into bins and show us a histogram as a result. Let's look at this example:

import seaborn as sns
import matplotlib.pyplot as plt


# Get the BigMac index in 2017
current_bigmac = bigmac_df[(bigmac_df.Date == "2017-01-31")]

# Plot the histogram
ax = sns.distplot(current_bigmac.dollar_price)
plt.show()

The seaborn.distplot function expects either pandas Series, single-dimensional numpy.array, or a Python list as input. Then, it determines the size of the bins according to the Freedman-Diaconis rule, and finally it fits a kernel density estimate (KDE) over the histogram.

KDE is a non-parametric method used to estimate the distribution of a variable. We can also supply a parametric distribution, such as beta, gamma, or normal distribution, to the fit argument.

In this example, we are going to fit the normal distribution from the scipy.stats package over the Big Mac Index dataset:

from scipy import stats

ax = sns.distplot(current_bigmac.dollar_price, kde=False, fit=stats.norm)
plt.show()
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset