Time for action – analyzing random values

We will generate random values that mimic a normal distribution and analyze the generated data with statistical functions from the scipy.stats package. Perform the following steps to do so:

  1. Generate random values from a normal distribution using the scipy.stats package.
    generated = stats.norm.rvs(size=900)
  2. Fit the generated values to a normal distribution. This basically gives us the mean and standard deviation of the data set.
    print “Mean”, “Std”, stats.norm.fit(generated)

    The mean and standard deviation would be shown as follows:

    Mean Std (0.0071293257063200707, 0.95537708218972528)
    
  3. Skewness tells us how skewed (asymmetric) a probability distribution is. Perform a skewness test. This test returns two values. The second value is the p-value; the probability that the skewness of the data set corresponds to a normal distribution. The pvalue instances range from 0 to 1.
    print “Skewtest”, “pvalue”, stats.skewtest(generated)

    The result of the skewness test would be shown as follows:

    Skewtest pvalue (-0.62120640688766893, 0.5344638245033837)
    

    So there is a 53 percent chance that we are dealing with a normal distribution.

  4. Kurtosis tells us how “curved” a probability distribution is. Perform a kurtosis test. This test is set up in a similar way as the skewness test, but of course, applies to kurtosis.
    print “Kurtosistest”, “pvalue”,
      stats.kurtosistest(generated)

    The result of the kurtosis test would be shown as follows:

    Kurtosistest pvalue (1.3065381019536981, 0.19136963054975586)
    
  5. A normality test tells us how likely it is that a data set complies to the normal distribution. Perform a normality test. This test also returns two values, of which the second is the p-value
    print “Normaltest”, “pvalue”, stats.normaltest(generated)

    The result of the normality test would be shown as follows:

    Normaltest pvalue (2.09293921181506, 0.35117535059841687)
    
  6. We can easily find the value at a certain percentile with SciPy.
    print “95 percentile”,
      stats.scoreatpercentile(generated, 95)

    The value at the 95th percentile would be shown as follows:

    95 percentile 1.54048860252
    
  7. Do the opposite of the previous step to find the percentile at 1.
    print “Percentile at 1”,
      stats.percentileofscore(generated, 1)

    The percentile at 1 would be shown as follows:

    Percentile at 1 85.5555555556
    
  8. Plot the generated values in a histogram with Matplotlib. More information about Matplotlib can be found in the previous chapter.
    plt.hist(generated)
    plt.show()

    The following is the histogram of the generated random values:

    Time for action – analyzing random values

What just happened?

We created a data set from a normal distribution and analyzed it with the scipy.stats module (see statistics.py).

from scipy import stats
import matplotlib.pyplot as plt

generated = stats.norm.rvs(size=900)
print “Mean”, “Std”, stats.norm.fit(generated)
print “Skewtest”, “pvalue”, stats.skewtest(generated)
print “Kurtosistest”, “pvalue”, stats.kurtosistest(generated)
print “Normaltest”, “pvalue”, stats.normaltest(generated)
print “95 percentile”, stats.scoreatpercentile(generated, 95)
print “Percentile at 1”, stats.percentileofscore(generated, 1)
plt.hist(generated)
plt.show()

Have a go hero – improving the data generation

Judging from the histogram in the Time for action – analyzing random values section, there is still room for improvement when it comes to generating the data. Try using NumPy or different parameters of the scipy.stats.norm.rvs function.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset