Data fitting with neural network

Data fitting is the process of building a curve or a mathematical function that has the best match with a set of previously collected points. The curve fitting can relate to both interpolations, where exact data points are required, and smoothing, where a flat function is built that approximates the data. The approximate curves obtained from the data fitting can be used to help display data, to predict the values of a function where no data is available, and to summarize the relationship between two or more variables. In the following figure is shown a linear interpolation of collected data:

Data fitting is the process of training a neural network on a set of inputs in order to produce an associated set of target outputs. Once the neural network has fit the data, it forms a generalization of the input-output relationship and can be used to generate outputs for inputs it was not trained on.

The fuel consumption of vehicles has always been studied by the major manufacturers of the entire planet. In an era characterized by oil refueling problems and even greater air pollution problems, fuel consumption by vehicles has become a key factor. In this example, we will build a neural network with the purpose of predicting the fuel consumption of the vehicles according to certain characteristics.

To do this, use the Auto dataset contained in the ISLR package that we have already used in an example in Chapter 3, Deep Learning Using Multilayer Neural Networks. The Auto dataset contain gas mileage, horsepower, and other information for 392 vehicles. It is a data frame with 392 observations on the following nine variables:

  • mpg: Miles per gallon
  • cylinders: Number of cylinders between 4 and 8
  • displacement: Engine displacement (cubic inches)
  • horsepower: Engine horsepower
  • weight: Vehicle weight (lbs)
  • acceleration: Time to accelerate from 0 to 60 mph (sec)
  • year: Model year (modulo 100)
  • origin: Origin of car (American, European, Japanese)
  • name: Vehicle name

The following is the code that we will use in this example:

###########################################################################
########Chapter 5 - Introduction to Neural Networks - using R##############
##########R program to build, train and test neural networks###############
###########################################################################
library("neuralnet")
library("ISLR")

data = Auto
View(data)

plot(data$weight, data$mpg, pch=data$origin,cex=2)
par(mfrow=c(2,2))
plot(data$cylinders, data$mpg, pch=data$origin,cex=1)
plot(data$displacement, data$mpg, pch=data$origin,cex=1)
plot(data$horsepower, data$mpg, pch=data$origin,cex=1)
plot(data$acceleration, data$mpg, pch=data$origin,cex=1)

mean_data <- apply(data[1:6], 2, mean)
sd_data <- apply(data[1:6], 2, sd)

data_scaled <- as.data.frame(scale(data[,1:6],center = mean_data, scale = sd_data))
head(data_scaled, n=20)

index = sample(1:nrow(data),round(0.70*nrow(data)))
train_data <- as.data.frame(data_scaled[index,])
test_data <- as.data.frame(data_scaled[-index,])

n = names(data_scaled)
f = as.formula(paste("mpg ~", paste(n[!n %in% "mpg"], collapse = " + ")))

net = neuralnet(f,data=train_data,hidden=3,linear.output=TRUE)
plot(net)

predict_net_test <- compute(net,test_data[,2:6])
MSE.net <- sum((test_data$mpg - predict_net_test$net.result)^2)/nrow(test_data)

Lm_Mod <- lm(mpg~., data=train_data)
summary(Lm_Mod)
predict_lm <- predict(Lm_Mod,test_data)
MSE.lm <- sum((predict_lm - test_data$mpg)^2)/nrow(test_data)

par(mfrow=c(1,2))
plot(test_data$mpg,predict_net_test$net.result,col='black',main='Real vs predicted for neural network',pch=18,cex=4)
abline(0,1,lwd=5)
plot(test_data$mpg,predict_lm,col='black',main='Real vs predicted for linear regression',pch=18,cex=4)
abline(0,1,lwd=5)
###########################################################################

As usual, we will analyze the code line-by-line, by explaining in detail all the features applied to capture the results.

library("neuralnet")
library("ISLR")

The first two lines of the initial code are used to load the libraries needed to run the analysis.

Remember, to install a library that is not present in the initial distribution of R, you must use the install.package function. This is the main function to install packages. It takes a vector of names and a destination library, downloads the packages from the repositories and installs them. This function should be used only once and not every time you run the code.

The neuralnet library is used to train neural networks using backpropagation, resilient backpropagation (RPROP) with or without weight backtracking, or the modified globally convergent version (GRPROP). The function allows flexible settings through custom-choice of error and activation function. Furthermore, the calculation of generalized weights is implemented.

The ISLR library contains a set of datasets freely usable for our examples. This is a series of data collected during major studies conducted by research centers.

data = Auto
View(data)

This command loads the Auto dataset, which, as we anticipated, is contained in the ISLR library, and saves it in a given dataframe. Use the View function to view a compact display of the structure of an arbitrary R object. The following screenshot shows some of the data contained in the Auto dataset:

As you can see, the database consists of 392 rows and 9 columns. The rows represent 392 commercial vehicles from 1970 to 1982. The columns represent the 9 characteristics collected for each car, in order: mpg, cylinders, displacement, horsepower, weight, acceleration, year, origin, and name.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset