Let's say a reading competition was conducted with some adults. The data looks like the following:
[236, 239, 209, 246, 246, 245, 215, 212, 242, 241, 219, 242, 236, 211, 216, 214, 203, 223, 200, 238, 215, 227, 222, 204, 200, 208, 204, 230, 216, 204, 201, 202, 240, 209, 246, 224, 243, 247, 215,249, 239, 211, 227, 211, 247, 235, 200, 240, 213, 213, 209, 219,209, 222, 244, 226, 205, 230, 238, 218, 242, 238, 243, 248, 228,243, 211, 217, 200, 237, 234, 207, 217, 211, 224, 217, 205, 233, 222, 218, 202, 205, 216, 233, 220, 218, 249, 237, 223]
Now, our hypothesis question is this: Is the average reading speed of random students (adults) more than 212 words per minute?
We can break down the preceding concept into the following parameters:
- Population: All adults
- Parameter of interest: μ, the population of a classroom
- Null hypothesis: μ = 212
- Alternative hypothesis: μ > 212
- Confidence level: α = 0.05
We know all the required parameters. Now, we can use a Z-test from the statsmodels package with alternate="larger":
import numpy as np
sdata = np.random.randint(200, 250, 89)
sm.stats.ztest(sdata, value = 80, alternative = "larger")
The output of the preceding code is as follows:
(91.63511530225408, 0.0)
Since the computed P-value (0.0) is lower than the standard confidence level (α = 0.05), we can reject the null hypothesis. That means the statement the average reading speed of adults is 212 words per minute is rejected.