Bagged classification and regression trees (treeBag) implementation

To begin, load the essential libraries and register the number of cores for parallel processing:

library(doMC) 
registerDoMC(cores = 4)  
library(caret) 
#setting the random seed for replication 
set.seed(1234) 
# setting the working directory where the data is located 
setwd("~/Desktop/chapter 15") 
# reading the data 
mydata <- read.csv("WA_Fn-UseC_-HR-Employee-Attrition.csv") 
#removing the non-discriminatory features identified during EDA 
mydata$EmployeeNumber=mydata$Over18=mydata$EmployeeCount=mydata$StandardHours = NULL 
#setting up cross-validation 
cvcontrol <- trainControl(method="repeatedcv", repeats=10, number = 10, allowParallel=TRUE) 
# model creation with treebag , observe that the number of bags is set as 10 
train.bagg <- train(Attrition ~ ., data=mydata, method="treebag",B=10, trControl=cvcontrol, importance=TRUE) 
train.bagg

This will result in the following output:

Bagged CART  
1470 samples 
  30 predictors 
   2 classes: 'No', 'Yes'  
No pre-processing 
Resampling: Cross-Validated (10 fold, repeated 10 times)  
Summary of sample sizes: 1324, 1323, 1323, 1322, 1323, 1322, ...  
Resampling results: 
  Accuracy  Kappa     
  0.854478  0.2971994

We can see that we achieved a better accuracy of 85.4% compared to 84% accuracy that was obtained with the KNN algorithm.

Table of Contents for Bagged classification and regression trees (treeBag) implementation

Create new playlist

Sign In

Sign Up

Table of Contents for
Bagged classification and regression trees (treeBag) implementation