We will generate random values that mimic a normal distribution and analyze the generated data with statistical functions from the scipy.stats
package.
scipy.stats
package:generated = stats.norm.rvs(size=900)
print("Mean", "Std", stats.norm.fit(generated))
The mean and standard deviation appear as follows:
Mean Std (0.0071293257063200707, 0.95537708218972528)
P-values range from 0
to 1
:
print("Skewtest", "pvalue", stats.skewtest(generated))
The result of the skewness test appears as follows:
Skewtest pvalue (-0.62120640688766893, 0.5344638245033837)
So, there is a 53
percent chance we are not dealing with a normal distribution. It is instructive to see what happens if we generate more points, because if we generate more points, we should have a more normal distribution. For 900,000 points, we get a p-value of 0.16
. For 20 generated values, the p-value is 0.50
.
print("Kurtosistest", "pvalue", stats.kurtosistest(generated))
The result of the kurtosis test appears as follows:
Kurtosistest pvalue (1.3065381019536981, 0.19136963054975586)
The p-value for 900,000 values is 0.028
. For 20 generated values, the p-values is 0.88
.
print("Normaltest", "pvalue", stats.normaltest(generated))
The result of the normality test appears as follows:
Normaltest pvalue (2.09293921181506, 0.35117535059841687)
The p-value for 900,000 generated values is 0.035
. For 20 generated values, the p-value is 0.79
.
print("95 percentile", stats.scoreatpercentile(generated, 95))
The value at the 95th
percentile appears as follows:
95 percentile 1.54048860252
1
:print("Percentile at 1", stats.percentileofscore(generated, 1))
The percentile at 1
appears as follows:
Percentile at 1 85.5555555556
matplotlib
(more information about matplotlib
can be found in the previous Chapter 9, Plotting with matplotlib):plt.hist(generated)
The histogram of the generated random values is as follows:
We created a dataset from a normal distribution and analyzed it with the scipy.stats
module (see statistics.py
):
from __future__ import print_function from scipy import stats import matplotlib.pyplot as plt generated = stats.norm.rvs(size=900) print("Mean", "Std", stats.norm.fit(generated)) print("Skewtest", "pvalue", stats.skewtest(generated)) print("Kurtosistest", "pvalue", stats.kurtosistest(generated)) print("Normaltest", "pvalue", stats.normaltest(generated)) print("95 percentile", stats.scoreatpercentile(generated, 95)) print("Percentile at 1", stats.percentileofscore(generated, 1)) plt.title('Histogram of 900 random normally distributed values') plt.hist(generated) plt.grid() plt.show()