Neural network regression with the Boston dataset

In this section, we will run a regression neural network for the Boston dataset. The medv value is predicted for the test data. The train to test split is 70:30. The neuralnet function is used to model the data with a neural network:

#####################################################################
###Chapter 2 - Introduction to Neural Networks - using R ############
###Simple R program to build, train, test regression neural networks#
#########################flename: Boston.r###########################
#####################################################################

library("neuralnet")
library(MASS)

set.seed(1)

data = Boston

max_data <- apply(data, 2, max) 
min_data <- apply(data, 2, min)
data_scaled <- scale(data,center = min_data, scale = max_data - min_data) 

index = sample(1:nrow(data),round(0.70*nrow(data)))
train_data <- as.data.frame(data_scaled[index,])
test_data <- as.data.frame(data_scaled[-index,])

n = names(data)
f = as.formula(paste("medv ~", paste(n[!n %in% "medv"], collapse = " + ")))
net_data = neuralnet(f,data=train_data,hidden=10,linear.output=T)
plot(net_data)

predict_net_test <- compute(net_data,test_data[,1:13])

predict_net_test_start <- predict_net_test$net.result*(max(data$medv)-min(data$medv))+min(data$medv)
test_start <- as.data.frame((test_data$medv)*(max(data$medv)-min(data$medv))+min(data$medv))
MSE.net_data <- sum((test_start - predict_net_test_start)^2)/nrow(test_start)

Regression_Model <- lm(medv~., data=data)
summary(Regression_Model)
test <- data[-index,]
predict_lm <- predict(Regression_Model,test)
MSE.lm <- sum((predict_lm - test$medv)^2)/nrow(test)

MSE.net_data
MSE.lm
###########################################################################

Don't worry, now we will explain in detail the whole code, line by line.

library("neuralnet")
library(MASS)

The first two lines of the code are simple, as they load the libraries we will use for later calculations. Specifically, the neuralnet library will help us to build and train the network, while the MASS library will serve us to load the Boston dataset that we have previously introduced in detail.

Remember, to install a library that is not present in the initial distribution of R, you must use the install.package function. This is the main function to install packages. It takes a vector of names and a destination library, downloads the packages from the repositories and installs them.

In our case, for example, to install the neuralnet package, we should write:

install.neuralnet

Finally, it should be emphasized that this function should be used only once and not every time you run the code. Instead, load the library through the following command and must be repeated every time you run the code:

library (neuralnet)

The function set.seed sets the seed of R's random number generator, which is useful for creating simulations or random objects that can be reproduced:

set.seed(1)

You have to use this function every time you want to get a reproducible random result. In this case, the random numbers are the same, and they would continue to be the same no matter how far out in the sequence we go.

The following command loads the Boston dataset, which, as we anticipated, is contained in the MASS library and saves it in a given frame:

data = Boston

Use the str function to view a compactly display the structure of an arbitrary R object. In our case, using str(data), we will obtain the following results:

> str(data)
'data.frame': 506 obs. of 14 variables:
 $ crim : num 0.00632 0.02731 0.02729 0.03237 0.06905 ...
 $ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
 $ indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
 $ chas : int 0 0 0 0 0 0 0 0 0 0 ...
 $ nox : num 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
 $ rm : num 6.58 6.42 7.18 7 7.15 ...
 $ age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
 $ dis : num 4.09 4.97 4.97 6.06 6.06 ...
 $ rad : int 1 2 2 3 3 3 5 5 5 5 ...
 $ tax : num 296 242 242 222 222 222 311 311 311 311 ...
 $ ptratio: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
 $ black : num 397 397 393 395 397 ...
 $ lstat : num 4.98 9.14 4.03 2.94 5.33 ...
 $ medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...

The result obtained for the given object is shown in the following figure:

Let's go back to parse the code:

max_data <- apply(data, 2, max) 
min_data <- apply(data, 2, min)
data_scaled <- scale(data,center = min_data, scale = max_data - min_data)

We need this snippet of code to normalize the data.

Remember, it is good practice to normalize the data before training a neural network. With normalization, data units are eliminated, allowing you to easily compare data from different locations.

This is an extremely important procedure in building a neural network, as it avoids unnecessary results or very difficult training processes resulting in algorithm convergence problems. You can choose different methods for scaling the data (z-normalization, min-max scale, and so on). For this example, we will use the min-max method (usually called feature scaling) to get all the scaled data in the range [0,1]. The formula to achieve this is the following:

Before applying the method chosen for normalization, you must calculate the minimum and maximum values of each database column. To do this, we use the apply function. This function returns a vector or an array or a list of values obtained by applying a function to margins of an array or matrix. Let's understand the meaning of the arguments used.

max_data <- apply(data, 2, max)

The first argument of the apply function specifies the dataset to apply the function, in our case, the dataset named data. The second argument must contain a vector giving the subscripts which the function will be applied over. In our case, one indicates rows and 2 indicates columns. The third argument must contain the function to be applied; in our case, the max function.

To normalize the data, we use the scale function, which is a generic function whose default method centers and/or scales the columns of a numeric matrix.

index = sample(1:nrow(data),round(0.70*nrow(data)))
train_data <- as.data.frame(data_scaled[index,])
test_data <- as.data.frame(data_scaled[-index,])

In the first line of the code just suggested, the dataset is split into 70:30, with the intention of using 70 percent of the data at our disposal to train the network and the remaining 30 percent to test the network. In the second and third lines, the data of the dataframe named data is subdivided into two new dataframes, called train_data and test_data.

n = names(data)
f = as.formula(paste("medv ~", paste(n[!n %in% "medv"], collapse = " + ")))
net_data = neuralnet(f,data=train_data,hidden=10,linear.output=T)
plot(net_data)

Everything so far has only been used to prepare the data. It is now time to build the network. To do this, we first recover all the variable names using the names function. This function will get or set the name of an object.

Next, we build formula that we will use to build the network, so we use the neuralnet function to build and train the network. In this case, we will create a network with only one hidden layer with 10 nodes. Finally, we plot the neural network,as shown in the following figure:

Now that we have the network, what do we do? Of course, we use it to make predictions. We had set aside 30 percent of the available data to do this:

predict_net_test <- compute(net_data,test_data[,1:13])

In our case, we applied the function to the test_data dataset, using only the first 13 columns representing the input variables of the network:

predict_net_test_start <- predict_net_test$net.result*(max(data$medv)-       min(data$medv))+min(data$medv)
test_start <- as.data.frame((test_data$medv)*(max(data$medv)-min(data$medv))+min(data$medv))
MSE.net_data <- sum((predict_net_test_start - test_start)^2)/nrow(test_start)

But how do we figure out whether the forecasts the network is able to perform are accurate? We can use the Mean Squared Error (MSE) as a measure of how far away our predictions are from the real data.

In this regard, it is worth remembering that before we built the network we had normalized the data. Now, in order to be able to compare, we need to step back and return to the starting position. Once the values of the dataset are restored, we can calculate the MSE through the following equation:

Well, we have calculated MSE now with what do we compare it to? To get an idea of the accuracy of the network prediction, we can build a linear regression model:

Regression_Model <- lm(medv~., data=data)
summary(Regression_Model)
test <- data[-index,]
predict_lm <- predict(Regression_Model,test)
MSE.lm <- sum((predict_lm - test$medv)^2)/nrow(test)

We build a linear regression model using the lm function. This function is used to fit linear models. It can be used to perform regression, single stratum analysis of variance, and analysis of covariance. To produce result summaries of the results of model fitting obtained, we have used the summary function, which returns the following results:

> summary(Regression_Model)

Call:
lm(formula = medv ~ ., data = data)

Residuals:
 Min 1Q Median 3Q Max 
-15.5944739 -2.7297159 -0.5180489 1.7770506 26.1992710

Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
(Intercept) 36.4594883851 5.1034588106 7.14407 0.00000000000328344 ***
crim -0.1080113578 0.0328649942 -3.28652 0.00108681 ** 
zn 0.0464204584 0.0137274615 3.38158 0.00077811 ***
indus 0.0205586264 0.0614956890 0.33431 0.73828807 
chas 2.6867338193 0.8615797562 3.11838 0.00192503 ** 
nox -17.7666112283 3.8197437074 -4.65126 0.00000424564380765 ***
rm 3.8098652068 0.4179252538 9.11614 < 0.000000000000000222 ***
age 0.0006922246 0.0132097820 0.05240 0.95822931 
dis -1.4755668456 0.1994547347 -7.39800 0.00000000000060135 ***
rad 0.3060494790 0.0663464403 4.61290 0.00000507052902269 ***
tax -0.0123345939 0.0037605364 -3.28001 0.00111164 ** 
ptratio -0.9527472317 0.1308267559 -7.28251 0.00000000000130884 ***
black 0.0093116833 0.0026859649 3.46679 0.00057286 ***
lstat -0.5247583779 0.0507152782 -10.34715 < 0.000000000000000222 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.745298 on 492 degrees of freedom
Multiple R-squared: 0.7406427, Adjusted R-squared: 0.7337897 
F-statistic: 108.0767 on 13 and 492 DF, p-value: < 0.00000000000000022204

Also, for the regression model, we calculate the mean MSE. Finally, in order to assess the performance of the network, it is compared with a multiple linear regression model calculated with the same database as follows:

MSE.net_data
MSE.lm

The results are:

> MSE.net_data
[1] 12.0692812
> MSE.lm
[1] 26.99265692

From the analysis of the results, it is possible to note that the neural network has a lower MSE than the linear regression model.

Table of Contents for Neural network regression with the Boston dataset

Create new playlist

Sign In

Sign Up

Table of Contents for
Neural network regression with the Boston dataset