Modeling using neural networks

Neural networks, or to be more specific in this case, artificial neural networks, is a family of machine learning models whose concepts are based on the working of biological neural networks, just like our nervous system. Neural networks have been there for a long time, but recently there has been an upsurge of interest in building highly intelligent systems using deep learning and artificial intelligence. Deep learning makes use of deep neural networks, which are essentially neural networks with a huge number of hidden layers between the input and output layers. A typical neural network can be visualized with the following figure:

Modeling using neural networks

From the figure, you can deduce that this neural network is an interconnected network of various nodes, also called neurons. Each node represents a neuron which is nothing but a mathematical function. It is impossible to go into every detail of how to represent a node mathematically but we will give the gist here. These mathematical functions receive one or more inputs with weights, which are represented in the preceding figure as edges, and then it performs some computation on these inputs to give an output. Various popular functions used in these nodes include step function and the sigmoid function, which you have already seen in use in the logistic regression algorithm. Once the inputs are weighed and transformed by the function, the activation of these functions is sent to further nodes until it reaches the output layer. A collection of nodes form a layer, just like in the earlier figure, we have three layers.

So, a neural network depends on several neurons or nodes and the pattern of interconnection between them, the learning process that is used to update the weights of the connections at each iteration (popularly called as epoch), and the activation functions of the nodes that convert the node's inputs with weights to its output activation, which is passed layer by layer till we get the output prediction. We will start with loading the necessary dependencies as follows:

> library(caret) # nn models
> library(ROCR) # evaluate models
> source("performance_plot_utils.R") # plot curves
> # data transformation
> test.feature.vars <- test.data[,-1]
> test.class.var <- test.data[,1]

We will now have to do some feature value encoding, similar to what we did when we did AUC optimization for SVM. To refresh your memory, you can run the following code snippet:

> transformed.train <- train.data
> transformed.test <- test.data
> for (variable in categorical.vars){
+   new.train.var <- make.names(train.data[[variable]])
+   transformed.train[[variable]] <- new.train.var
+   new.test.var <- make.names(test.data[[variable]])
+   transformed.test[[variable]] <- new.test.var
+ }
> transformed.train <- to.factors(df=transformed.train, variables=categorical.vars)
> transformed.test <- to.factors(df=transformed.test, variables=categorical.vars)
> transformed.test.feature.vars <- transformed.test[,-1]
> transformed.test.class.var <- transformed.test[,1]

Once we have our data ready, we will build our initial neural network model using all the features as follows:

> formula.init <- "credit.rating ~ ."
> formula.init <- as.formula(formula.init)
> nn.model <- train(formula.init, data = transformed.train, method="nnet")

The preceding code snippet might ask you to install the nnet package if you do not have it installed, so just select the option when it asks you and it will install it automatically and build the model. If it fails, you can install it separately and run the code again. Remember, it is an iterative process so the model building might take some time. Once the model converges, you can view the model details using the print(nn.model) command which will show several iteration results with different size and decay options, and you will see that it does hyperparameter tuning internally itself to try and get the best model!

We now perform predictions on the test data and evaluate the model performance as follows:

> nn.predictions <- predict(nn.model, 
                       transformed.test.feature.vars, type="raw")
> confusionMatrix(data=nn.predictions, 
                  reference=transformed.test.class.var, 
                  positive="X1")

You can observe from the following output that our model has an accuracy of 72%, which is quite good. It is predicting bad ratings as bad quite well, which is evident from the specificity which is 48%, and as usual sensitivity is good at 84%.

Modeling using neural networks

We will now use the following code snippet to plot the features of importance for neural network based models:

> formula.init <- "credit.rating ~ ."
> formula.init <- as.formula(formula.init)
> control <- trainControl(method="repeatedcv", number=10, repeats=2)
> model <- train(formula.init, data=transformed.train, method="nnet", 
               trControl=control)
> importance <- varImp(model, scale=FALSE)
> plot(importance)

This gives us the following plot ranking variables according to their importance:

Modeling using neural networks

We select some of the most important features from the preceding plot and build our next model as follows:

> formula.new <- "credit.rating ~ account.balance + credit.purpose + savings + current.assets +
foreign.worker + previous.credit.payment.status"
> formula.new <- as.formula(formula.new)
> nn.model.new <- train(formula.new, data=transformed.train, method="nnet")

We now perform predictions on the test data and evaluate the model performance:

> nn.predictions.new <- predict(nn.model.new, 
                          transformed.test.feature.vars,  
                          type="raw")
> confusionMatrix(data=nn.predictions.new, 
                  reference=transformed.test.class.var, 
                  positive="X1")

This gives us the following confusion matrix with various metrics of our interest. We observe from the following output that the accuracy has increased slightly to 73% and sensitivity has now increased to 87% at the cost of specificity, which has dropped to 43%:

Modeling using neural networks

You can check the hyperparameter tuning which it has done internally, as follows:

> plot(nn.model.new, cex.lab=0.5)

The following plot shows the accuracy of the various models with different numbers of nodes in the hidden layer and the weight decay:

Modeling using neural networks

Based on our requirement that the bank makes minimum losses, we select the best model as the initial neural network model that was built, since it has an accuracy similar to the new model and its specificity is much higher which is extremely important. We now plot some performance curves for the best model as follows:

> nn.model.best <- nn.model
> nn.predictions.best <- predict(nn.model.best, transformed.test.feature.vars, type="prob")
> nn.prediction.values <- nn.predictions.best[,2]
> predictions <- prediction(nn.prediction.values, test.class.var)
> par(mfrow=c(1,2))
> plot.roc.curve(predictions, title.text="NN ROC Curve")
> plot.pr.curve(predictions, title.text="NN Precision/Recall Curve")

We observe from the following plot that the AUC is 0.74, which is quite good and performs a lot better than the baseline denoted in red:

Modeling using neural networks

This concludes our predictive modeling session and we will wrap it up with model selection and comparisons.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset