Modeling and evaluation

The package that we will use is neuralnet. The function in neuralnet will call for the use of a formula as we used elsewhere, such as y~x1+x2+x3+x4, data = df. In the past, we used y~. to specify all the other variables in the data as inputs. However, neuralnet does not accommodate this at the time of writing this. The way around this limitation is to use the as.formula() function. After first creating an object of the variable names, we will use this as an input in order to paste the variables properly on the right side of the equation:

> n = names(shuttleTrain)

> n
 [1] "stability.xstab" "error.MM"        "error.SS"       
 [4] "error.XL"        "sign.pp"         "wind.tail"      
 [7] "magn.Medium"     "magn.Out"        "magn.Strong"    
[10] "vis.yes"         "use"            

> form <- as.formula(paste("use ~", paste(n[!n %in% "use"], collapse = " + ")))

> form
use ~ stability.xstab + error.MM + error.SS + error.XL + sign.pp + wind.tail + magn.Medium + magn.Out + magn.Strong + vis.yes

Keep this function in mind for your own use as it may come in quite handy. In the neuralnet package, the function that we will use is appropriately named neuralnet(). Other than the formula, there are four other critical arguments that we will need to examine:

  • hidden: This is the number of hidden neurons in each layer, which can be up to three layers; the default is 1
  • act.fct: This is the activation function with the default logistic and tanh available
  • err.fct: This is the function used to calculate the error with the default "sse"; as we are dealing with binary outcomes, we will use "ce" for cross-entropy
  • linear.output: This is a logical argument on whether or not to ignore act.fct with the default TRUE, so for our data, this will need to be FALSE

You can also specify the algorithm. The default is resilient with backpropagation and we will use it along with the default of one hidden neuron:

> fit = neuralnet(form, data=shuttleTrain, err.fct="ce", linear.output=FALSE)

Here are the overall results:

> fit$result.matrix
                                           1
error                         0.008356573154
reached.threshold             0.008074195519
steps                       689.000000000000
Intercept.to.1layhid1        -4.285590085839
stability.xstab.to.1layhid1   1.929309791976
error.MM.to.1layhid1         -1.724487179503
error.SS.to.1layhid1         -2.634918418999
error.XL.to.1layhid1         -0.438426863298
sign.pp.to.1layhid1          -0.857597732824
wind.tail.to.1layhid1         0.095087222283
magn.Medium.to.1layhid1      -0.016988896781
magn.Out.to.1layhid1          1.861116820668
magn.Strong.to.1layhid1       0.121644784735
vis.yes.to.1layhid1           6.272628328980
Intercept.to.use             32.409085601639
1layhid.1.to.use            -67.336820525475

We can see that the error is extremely low at 0.0081. The number of steps required for the algorithm to reach the threshold, which is when the absolute partial derivatives of the error function become smaller than this threshold (default = 0.1). The highest weight of the first neuron is vis.yes.to.1layhid1 at 6.27.

You can also look at what are known as generalized weights. According to the neuralnet package authors, the generalized weight is defined as the contribution of the ith covariate to the log-odds:

Modeling and evaluation

The generalized weight expresses the effect of each covariate xi and thus has an analogous interpretation as the ith regression parameter in regression models. However, the generalized weight depends on all other covariates. (Gunther and Fritsch, 2010). The weights can be called and examined. I've abbreviated the output to the first four variables and six observations only. Note that if you sum each row, you will get the same number, which means that the weights are equal for each covariate combination:

> head(fit$generalized.weights[[1]])
            [,1]         [,2]         [,3]        [,4]    
1   -4.826708143  4.314287082  6.591985508 1.096847443
2   -4.751585064  4.247139344  6.489387581 1.079776065
6   -5.884240759  5.259548151  8.036290710 1.337166913
9  -11.346058891 10.141519613 15.495665693 2.578340208
10 -11.104906734  9.925969054 15.166316688 2.523539479
11 -10.952642060  9.789869358 14.958364085 2.488938025

To visualize the neural network, simply use the plot() function:

> plot(fit)

The following is the output of the preceding command:

Modeling and evaluation

This plot shows the weights of the variables and intercepts. You can also examine the generalized weights in a plot. Let's look at vis.yes versus wind.tail, which has a low overall synaptic weight. Notice how vis.yes is skewed and wind.tail has an even distribution of weights, implying little predictive power:

> par(mfrow=c(1,2))

> gwplot(fit, selected.covariate = "vis.yes")

> gwplot(fit, selected.covariate = "wind.tail")
Modeling and evaluation

We now want to see how well the model performs. This is done with the compute() function and specifying the fit model and covariates. This syntax will be the same for the predictions on the test and train sets. Once computed, a list of the predictions is created with $net.result:

> res = compute(fit, shuttleTrain[,1:10])

> predTrain = res$net.result

These results are in probabilities, so let's turn them into 0 or 1 and follow this up with a confusion matrix:

> predTrain = ifelse(predTrain>=0.5,1,0)

> table(predTrain, shuttleTrain$use)
         
predTrain  0  1
        0 81  0
        1  0 99

Lo and behold, the neural network model has achieved 100 percent accuracy. We will now hold our breath and see how it does on the test set:

> res2 = compute(fit, shuttleTest[,1:10])

> predTest = res2$net.result

> predTest = ifelse(predTest>=0.5,1,0)

> table(predTest, shuttleTest$use)
        
predTest  0  1
       0 29  0
       1  1 46

Only one false positive in the test set. If you wanted to identify which one this was, use the which() function to single it out, as follows:

> which(predTest==1 & shuttleTest$use==0)
[1] 62
> shuttleTest[62,]
    stability.xstab error.MM error.SS error.XL sign.pp
203               0        1        0        0       1
    wind.tail magn.Medium magn.Out magn.Strong vis.yes
203         0           0        0           1       1
    use
203   0

It is row 62 in the test set and observation 203 in the full dataset. Can we improve on this result and achieve 100 percent accuracy in the test set? To do this, we will increase the complexity by specifying three neurons in the first layer and two neurons in the second layer:

> fit2 = neuralnet(form, data=shuttleTrain, hidden=c(3,2), err.fct="ce", linear.output=FALSE)

The plot results now get quite busy and extremely hard to interpret:

> plot(fit2)
Modeling and evaluation

It's now time to see how this performs on both the datasets:

> res = compute(fit2, shuttleTrain[,1:10])

> predTrain = res$net.result

> predTrain = ifelse(predTrain>=0.5,1,0)

> table(predTrain, shuttleTrain$use)
         
predTrain  0  1
        0 81  0
        1  0 99

Perform this on the train data as well:

> res2 = compute(fit2, shuttleTest[,1:10])

> predTest = res2$net.result

> predTest = ifelse(predTest>=0.5,1,0)

> table(predTest, shuttleTest$use)
        
predTest  0  1
       0 29  1
       1  1 45

However, we've now added a false negative to the mix. We added complexity but increased the out-of-sample variance, as follows:

> which(predTest==1 & shuttleTest$use==0)
[1] 62

> which(predTest==0 & shuttleTest$use==1)
[1] 63

The false positive observation has not changed from the prior fit and row 63 marks the false negative. In this case, adding complexity did not improve the test set's performance; in fact, it is less generalized than the simpler model.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset