Creating scores from the components

We will now need to capture the component loading as the scores for each observation. These scores indicate how each observation (soldier) relates to a component. Let's do this and capture the scores in a dataframe as we will need to use it for our analysis:

> pca_scores <- data.frame(round(pca_5$scores, digits = 2))

> head(pca_scores)
PC1 PC2 PC3 PC4 PC5
1 -1.37 0.29 1.06 0.09 0.29
2 -1.19 -0.45 -0.22 -1.61 0.22
3 -0.04 -1.19 -0.45 -0.69 0.05
4 1.44 -0.96 0.43 -1.87 -0.16
5 1.37 2.07 0.26 0.15 2.05
6 -0.09 0.29 -0.96 -0.07 0.17

We now have the scores for each component for each soldier. These are simply the features for each observation multiplied by the loading on each component and then summed. We now can bring in the response as a column in the data:

> pca_scores$weight <- trainY

With this done, I think we are compelled to examine the correlation of this data:

> DataExplorer::plot_correlation(pca_scores)

The output of the preceding code is as follows:

We see that components 1 and 2 are positively correlated to weight while the others seem meaningless. We must keep in mind this is univariate and our model may prove something different.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset