Data partitioning

Next, we change the data into a matrix format. We also set dimension names to NULL, which changes the names of the variables to the default names, V1, V2, V3, ..., V14:

data <- as.matrix(data)
dimnames(data) <- NULL

We then the partition data into training and test datasets using the following code:

# Data partitioning
set.seed(1234)
ind <- sample(2, nrow(data), replace = T, prob=c(.7, .3))
training <- data[ind==1, 1:13]
test <- data[ind==2, 1:13]
trainingtarget <- data[ind==1, 14]
testtarget <- data[ind==2, 14]

A data split of 70:30 is used in this example. To maintain the repeatability of the data split, we use a random seed of 1234. This will allow the same samples to be included in the training and test data each time data partitioning is carried out on any computer. The data for the independent variables are stored in training for the training data and in test for the test data. Similarly, the data for the dependent variable, medv, based on the corresponding split data, are stored in trainingtarget and testtarget.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset