Building attrition prediction model with XGBoost

Now, let's implement the attrition prediction model with XGBoost:

# loading the required libraries and registering the cores for multiprocessing 
library(doMC)
library(xgboost)
library(caret)
registerDoMC(cores=4)
# setting the working directory and loading the dataset
setwd("~/Desktop/chapter 15")
mydata <- read.csv("WA_Fn-UseC_-HR-Employee-Attrition.csv")
# removing the non-discriminatory features from the dataset as identified in EDA step
mydata$EmployeeNumber=mydata$Over18=mydata$EmployeeCount=mydata$StandardHours = NULL
# setting up cross validation parameters
ControlParamteres <- trainControl(method = "repeatedcv",number = 10, repeats=10, savePredictions = TRUE, classProbs = TRUE)
# setting up hyper parameters grid to tune
parametersGrid <- expand.grid(eta = 0.1, colsample_bytree=c(0.5,0.7), max_depth=c(3,6),nrounds=100, gamma=1, min_child_weight=2,subsample=0.5)
# printing the parameters grid to get an intuition
print(parametersGrid)
# xgboost model building
modelxgboost <- train(Attrition~., data = mydata, method = "xgbTree", trControl = ControlParamteres, tuneGrid=parametersGrid)
# printing the model summary
print(modelxgboost)

This will result in the following output:

eXtreme Gradient Boosting  
1470 samples
30 predictors
2 classes: 'No', 'Yes'

No pre-processing
Resampling: Cross-Validated (10 fold, repeated 10 times)
Summary of sample sizes: 1323, 1323, 1322, 1323, 1323, 1322, ...
Resampling results across tuning parameters:

max_depth colsample_bytree Accuracy Kappa
3 0.5 0.8737458 0.3802840
3 0.7 0.8734728 0.3845053
6 0.5 0.8730674 0.3840938
6 0.7 0.8732589 0.3920721

Tuning parameter 'nrounds' was held constant at a value of 100
Tuning parameter 'min_child_weight' was held constant at a value of 2
Tuning parameter 'subsample' was held constant at a value of 0.5
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were nrounds = 100, max_depth = 3, eta = 0.1, gamma = 1, colsample_bytree = 0.5, min_child_weight = 2 and subsample = 0.5.

Again, we observed that with XGBoost model, we have achieved an accuracy above 87%, which is a better accuracy compared to the 84% achieved with KNN.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset