Fitting quadratic and cubic models to test for linearity of log odds

If you recall what we have said about log odds and their relationship with explanatory variables, you can see that in the presence of non-linearity, a logistic model including quadratic and cubic terms would show performance better than that of the linear model.

This is the rationale behind the kind of test we are going to perform through the likelikhood-ratio test. We take as a null model the quadratic one and confront it with the alternative linear model. We then compute a test statistic and get a p-value. This is similar to what we have already seen about significant of statistical significance association of the associations between x and y.

If the p-value is lower than a given threshold, which we will set as usual to 0.05, we can accept here the null hypothesis and therefore conclude that non-linearity of log odds is observed and the assumption is not respected.

But don't panic, we do not have to perform directly all that I have described: we just have to run the lrtest() function.

First of all, let's train one more model besides the linear one. We want this model to also include the quadratic term of our variables. At least it will include the quadratic term of continuous variables, since raising a binary variable to the second power would be meaningless.

Let's do this:

logistic_quadratic <- glm(as.numeric(default_numeric) ~ . + 
cost_income^2 +
ROE^2 +
employees^2 +
ROS^2 +
company_revenues^2 ,
data = training_data, family = "binomial")

We now want to compare the resulting model, which we will employ as the null model, with the original one. Employing the lrtest() function from the lmtest() package, we can easily do it. We just have to pass to this function both the logistic_quadratic and the logistic objects. Just be aware that the first passed object will be the one employed as the null model:

lrtest(logistic_quadratic,logistic) 

As you can see, we have here a first recap of the two compared models, followed by two lines of output where the Pr(>Chisq) column shows us the p-value associated with the performed test. As you can see, we have a sound one, which lets us reject the null model and therefore conclude positively about the respect of our assumption.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset