At the moment, we have several models that attempt to predict the performance rating of the Shu army in head to head battles based on the duration and number of soldiers engaged in that battle. Yet, we do not have answers regarding which model is best and the relative contribution that each model makes above and beyond the preceding models.
We can use the process of hierarchical linear regression (HLR) to compare our models. Let us look at how HLR can be used to compare the three models that we have made thus far:
> #use HLR to compare different models > #first consider the models individually > #simple regression model using duration to predict battle rating > lmHeadToHeadRating_Duration_Summary
> #multiple regression model using duration, Shu soldiers, and Wei soldiers to predict battle rating > lmHeadToHeadRating_DurationSoldiers_Summary
anova(object, ...)
to compare the relative contribution of each model:> #use anova(object, ...) to compare the relative contribution of multiple models > #compare the three head to head combat models using ANOVA > anovaHeadToHeadRatingModelComparison <- anova(lmHeadToHeadRating_Duration, lmHeadToHeadRating_DurationSoldiers, lmHeadToHeadRating_DurationSoldiersShuWeiInteraction)
anova
results in the R console:> display the anova results > anovaHeadToHeadRatingModelComparison
You have the data that you need to complete a hierarchical linear regression (HLR) analysis. To be thorough, you should consider both the individual models (summaries) and the relative contribution of each model (ANOVA).
You are already familiar with interpreting model summaries. These are the best places to start when conducting an HLR analysis. You can check the summaries to see if each overall model and its coefficients are statistically significant. You should also take note of each model's R-squared value.
Our simple regression model is statistically significant on all accounts and has an amiable R-squared value of 77%. Likewise, all of the variables in our multiple regression model, as well as the model itself, are statistically significant. The model has an R-squared value of 86%. Furthermore, while our interaction model is also statistically significant, with an R-squared of 87%, two of its predictor variables are not statistically significant. Although these summaries provide us with a wealth of knowledge on the individual merits of each model, it is best to make a decision after considering the results of an anova
test.
Generally, analysis of variance (ANOVA) is a statistical procedure that compares the means of multiple groups and determines if they are significantly different from one another. In our case, ANOVA can be used in HLR to compare multiple regression models. Here, ANOVA determines if the coefficient(s) that each successive model brings to the overall regression equation makes a statistically significant contribution above and beyond the coefficients that preceded it.
Consider the following three models:
A: Y = X1 B: Y = X1 + X2 C: Y = X1 + X2 + X3
The difference between each model is that a new predictor is contributed to the regression equation. Model B contributes X2 in addition to model A
, whereas model C
contributes X3
in addition to model B
. ANOVA succeeds in determining whether these successive contributions are statistically significant. For instance, if model B
was found to be statistically significant through ANOVA, then including X2
in the regression model is likely to add value. Continuing, if model C
were not found to be statistically significant, then including X3
in the regression model probably does not add much value and therefore should be removed. By comparing successive models in this manner, we are able to determine, in a statistical sense, whether our coefficients are or are not adding value to the overall model. Thus, our decisions to include valuable coefficients and eliminate excess ones are informed.
Of course, we have to be mindful of practical significance at all times. When selecting independent variables for our model, we should use our understanding of the data and the situation to select only the best predictors. Although we could, it is inappropriate to haphazardly test numerous arbitrary combinations of variables in an attempt to find the supposed best statistical model. In fact, partaking in such practice is likely to lead to a model that is both meaningless in a practical sense and incapable of predicting valid answers to the questions that motivated the use of regression modeling in the first place. Therefore, always keep in mind the practical implications of every statistical analysis.
R's anova(object, ...)
is a variable-argument function that can be used to conduct ANOVA on several objects. Each object of comparison can be entered into the function as its own argument. For example, in:
anova(A, B, C)
Here we are telling R to compare three objects (A, B
, and C)
using ANOVA.
The anova(object, ...)
function yields an ANOVA table, which details the results of the analysis. For the purposes of comparing successive models using HLR, we are only concerned with the p-values (the Pr(>}|t|) column). The p-value beside each model indicates whether or not it is statistically significant above and beyond its preceding model. It does not however, indicate the individual statistical significance of the model, which is why we also considered the individual model summaries.
The ANOVA table from our activity indicates that our multiple regression model is statistically significant above and beyond our simple regression model. However, our interaction model does not make a statistically significant contribution above and beyond our multiple regression model. This suggests, from a statistical standpoint, that our interaction coefficient should be removed. Recall that we did not formulate a logical basis for the interaction between the number of Shu and Wei soldiers engaged in head to head combat. Without a statistical or practical reason to include the interaction coefficient, it is best removed from the model. In other words, our HLR analysis suggests that, out of the models that we analyzed, the multiple regression model is best.
a. The regression models' coefficients are statistically significant.
b. The overall regression model is statistically significant.
c. The contribution that the model makes is statistically significant.
d. The contribution that the model makes above and beyond the preceding model is statistically significant.
Using the techniques that we explored in this chapter, analyze the remaining battle methods surround, ambush, and fire and create regression models for each that predict the performance rating of the Shu army. Be sure to use your practical knowledge of the combat strategies to choose appropriate coefficients for your regression models. Once you have found a few reasonably predictive models for each method, use HLR to compare them. Ultimately, come to a statistically and practically justifiable conclusion about the best regression model to use for each battle method. Remember to save your R workspace and console text to preserve the content that you created during this chapter.