Comparing actual values with predicted results

Now, we will look at real values of weight of 15 women first and then will look at predicted values. Actual values of weight of 15 women are as follows, using the following command:

women$weight

When we execute the women$weight command, this is the result that we obtain:

Comparing actual values with predicted results

When we look at the predicted values, these are also read out in R:

Comparing actual values with predicted results

How can we put these pieces of data together?

women$pred <- linearregressionmodel$fitted.values

This is a very simple merge. When we look inside the women variable again, this is the result:

Comparing actual values with predicted results

Investigating relationships in the data

We can see the column names in the model by using the names command. In our example, it will appear as follows:

names(linearregressionmodel)

When we use this command, we get the following columns:

[1] "coefficients"  "residuals"     "effects"      
[4] "rank"          "fitted.values" "assign"       
[7] "qr"            "df.residual"   "xlevels"      
[10] "call"          "terms"         "model"  

We can identify the relationship between height and weight, by calculating the correlation. To do this, we can use Pearson's correlation coefficient, which is a measure of the linear correlation between two variables X and Y. It produces a result in the form of a value between +1 and −a inclusive, where 1 is a total positive correlation, 0 is no correlation, and -1 shows a perfect negative correlation. This value is known as Pearson's R.

In this example, we can use the cor function to compute Pearson's correlation coefficient. In our example, it appears as follows:

rmodel <- cor(weight,height)

We can see the result of the model by using the following command:

rmodel^2

We get the result of Pearson's R as follows:

0.9910098

This shows a high positive correlation between height and weight. We can find out more information by using the plot command, which will provide us with four visualizations in R. The command appears as follows:

plot(linearregression)

In order to assess the efficiency of the model in explaining the data, R provides us with four plots, which are tabulated as follows:

Plot Name

Purpose

Sample Plot

Residuals versus Leverage

This is a measure of the importance of determining the regression result. Cook's distance measures the importance of each observation to the regression line. Large distances indicate an outlier.

Investigating relationships in the data

Residuals versus Fitted

Residual error plotted against their fitted values. The residuals should be distributed randomly along the horizontal line, marked at zero. This line identifies a residual error of zero, and it makes it easier to see the residual error. There should be no trend here.

Investigating relationships in the data

Normal Q-Q

Identifies if the residuals are normally distributed.

Investigating relationships in the data

Scale-Location

This shows the square root of the relative error. There should be no trend here.

Investigating relationships in the data

Replicating our results using R and Tableau together

In this topic, let's get to work! Now that we have done some analysis and data visualisation in R, we will replicate our results using R and Tableau together.

Replicating our results using R and Tableau together

In the screenshot, we can see the index, original height, and original weight along with the Predicted amount. In the first row, we can see that the weight was 115 pounds, and the predicted amount was 112.6 pounds.

In Tableau, the calculation is gained using the Calculation Editor. An example is shown in the following screenshot:

Replicating our results using R and Tableau together

As we have done with height, we are also going to going to create the calculation for weight as seen in the following screenshot:

Replicating our results using R and Tableau together

When these calculated fields have been created, you can create the calculated field that holds the R calculation. The following screenshot will show a diagram of this field:

Replicating our results using R and Tableau together

Once the calculated fields have been created, you can drag the fields onto the canvas.

So we can calculate the correlation for all the fields, we need an index. Move index to the Dimensions tab by dragging it up from the Measures tab:

Create a formula for R
SCRIPT_REAL("cor(.arg1, .arg2)", 
([HeightSum]), ([WeightSum]) )
  • Then, drag HeightSum to Columns.
  • Next, drag WeightSum to Rows.
  • To show all of the marks, Add Index to the Detail Mark.
  • Add Correlation to the Detail Mark. Here is an example:
Replicating our results using R and Tableau together

When we look to see what the Correlation field is showing now, we can see that it isn't holding anything. How can we resolve that issue? Now, we need to fix the calculated field holding the R formula. It will need to be configured to show the correct settings for the calculation.

Our correlation is happening at the table level. However, in order to ensure that all data points are included in the correlation, we are going to specify here that the Index column is included. This means that all data points are included. Here is an example:

Replicating our results using R and Tableau together

Once we have done all of these steps, we can see that the Correlation field is now populated with a very high population.

Replicating our results using R and Tableau together
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset