The level of statistical significance of the association between the explanatory variable and the response variable

We get here into a fairly complex topic, but I will try to make it as clear as possible to help you understand the main concepts of it.

The topic here is statistical inference and hypothesis testing. Behind these words, there are a lot of techniques employed to assess if results obtained from statistical analyses can be considered significant or not, that is, if the results can be considered due to some structural True relation or simply by chance.

This main objective is pursued by setting a test. A test is usually composed of two hypotheses:

  • H0, or the null hypothesis, which can equally be the one we want to be true or the one we want to be false
  • H1, or the alternative hypothesis, which will be the opposite of H0

Once you are done with defining your testing hypothesis, you have to define how to actually test them. This is usually where the hypothesis testing actually comes into life, and basically is constituted from:

  • A test statistic related to the observed value we want to test
  • A p-value, that is the probability of finding the t-statistic (and therefore, the observed value) in case the null hypothesis is true

Let's imagine for example, that you want to test if the difference between two means, such as the one from a sample and the one from another sample, is significantly different. How do you do this? You can set the following test:

  • H0: The two means are not significantly different
  • H1: The two means are significantly different

You then compute the t-statistic, which is the one usually employed for comparing proportions. You can now check your t-statistics on a table, which is conveniently name t-label, and find out what the probability is of finding the value of the t-statistic in the hypothesis of the two means not being significantly different. 

Let's say for instance, that you get a probability of 0.03%, is it high or low? Well, it is for sure a low probability, but how can you tell if it is low enough to exclude the hypothesis of a non-significant difference? It turns out that by setting a level of significance to 95%, it is possible to set this threshold to 0.05, and it is possible to say that if we find a p-value lower than 0.05 we can reject the null hypothesis and accept the alternative one.

This brief explanation is surely not enough to let you understand how to set and perform hypothesis testing on your own. Nevertheless, it is enough to let you understand the output of logistic regression estimation and the vast majority of similar measures you will find in your model estimation. The call of the glm() function, besides estimating the coefficient of our logistic function, also performs a test of the hypothesis on the statistical significance of the level of association between the y and each of the x. What we find on the Pr(>|Z|) is exactly the p-value we were talking about.

If the p-value is below 0.05, we can be reject the null hypothesis of the association not being significant, while we cannot do this in the opposite case. What are the small stars on the right of this column? I know you are understanding it on your own—the number of stars is proportional to the p-value, and therefore to the level of statistical significance of the association. We therefore see that previous_default, company_revenues, and the ROS index are the variables, results in being the most significant one.

You should notice that the level of significance is in no way related to the level of the coefficient, but just with the statistical significance of the association found between the y and the explanatory variable.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset