Now, we will look at real values of weight of 15 women first and then will look at predicted values. Actual values of weight of 15 women are as follows, using the following command:
women$weight
When we execute the women$weight
command, this is the result that we obtain:
When we look at the predicted values, these are also read out in R:
How can we put these pieces of data together?
women$pred <- linearregressionmodel$fitted.values
This is a very simple merge. When we look inside the women variable again, this is the result:
We can see the column names in the model by using the names
command. In our example, it will appear as follows:
names(linearregressionmodel)
When we use this command, we get the following columns:
[1] "coefficients" "residuals" "effects" [4] "rank" "fitted.values" "assign" [7] "qr" "df.residual" "xlevels" [10] "call" "terms" "model"
We can identify the relationship between height and weight, by calculating the correlation. To do this, we can use Pearson's correlation coefficient, which is a measure of the linear correlation between two variables X and Y. It produces a result in the form of a value between +1 and −a inclusive, where 1 is a total positive correlation, 0 is no correlation, and -1 shows a perfect negative correlation. This value is known as Pearson's R.
In this example, we can use the cor
function to compute Pearson's correlation coefficient. In our example, it appears as follows:
rmodel <- cor(weight,height)
We can see the result of the model by using the following command:
rmodel^2
We get the result of Pearson's R as follows:
0.9910098
This shows a high positive correlation between height
and weight
. We can find out more information by using the plot
command, which will provide us with four visualizations in R. The command appears as follows:
plot(linearregression)
In order to assess the efficiency of the model in explaining the data, R provides us with four plots, which are tabulated as follows:
In this topic, let's get to work! Now that we have done some analysis and data visualisation in R, we will replicate our results using R and Tableau together.
In the screenshot, we can see the index, original height, and original weight along with the Predicted amount. In the first row, we can see that the weight was 115
pounds, and the predicted amount was 112.6
pounds.
In Tableau, the calculation is gained using the Calculation Editor. An example is shown in the following screenshot:
As we have done with height, we are also going to going to create the calculation for weight as seen in the following screenshot:
When these calculated fields have been created, you can create the calculated field that holds the R calculation. The following screenshot will show a diagram of this field:
Once the calculated fields have been created, you can drag the fields onto the canvas.
So we can calculate the correlation for all the fields, we need an index. Move index to the Dimensions tab by dragging it up from the Measures tab:
Create a formula for R SCRIPT_REAL("cor(.arg1, .arg2)", ([HeightSum]), ([WeightSum]) )
HeightSum
to Columns.WeightSum
to Rows.When we look to see what the Correlation field is showing now, we can see that it isn't holding anything. How can we resolve that issue? Now, we need to fix the calculated field holding the R formula. It will need to be configured to show the correct settings for the calculation.
Our correlation is happening at the table level. However, in order to ensure that all data points are included in the correlation, we are going to specify here that the Index column is included. This means that all data points are included. Here is an example:
Once we have done all of these steps, we can see that the Correlation field is now populated with a very high population.