Building attrition prediction model with XGBoost

Now, let's implement the attrition prediction model with XGBoost:

# loading the required libraries and registering the cores for multiprocessing 
library(doMC) 
library(xgboost) 
library(caret) 
registerDoMC(cores=4) 
# setting the working directory and loading the dataset 
setwd("~/Desktop/chapter 15") 
mydata <- read.csv("WA_Fn-UseC_-HR-Employee-Attrition.csv") 
# removing the non-discriminatory features from the dataset as identified in EDA step 
mydata$EmployeeNumber=mydata$Over18=mydata$EmployeeCount=mydata$StandardHours = NULL 
# setting up cross validation parameters 
ControlParamteres <- trainControl(method = "repeatedcv",number = 10, repeats=10, savePredictions = TRUE, classProbs = TRUE) 
# setting up hyper parameters grid to tune   
parametersGrid <-  expand.grid(eta = 0.1, colsample_bytree=c(0.5,0.7), max_depth=c(3,6),nrounds=100, gamma=1, min_child_weight=2,subsample=0.5) 
# printing the parameters grid to get an intuition 
print(parametersGrid) 
# xgboost model building 
modelxgboost <- train(Attrition~., data = mydata, method = "xgbTree", trControl = ControlParamteres, tuneGrid=parametersGrid) 
# printing the model summary 
print(modelxgboost)

This will result in the following output:

eXtreme Gradient Boosting  
1470 samples 
  30 predictors 
   2 classes: 'No', 'Yes'  

No pre-processing 
Resampling: Cross-Validated (10 fold, repeated 10 times)  
Summary of sample sizes: 1323, 1323, 1322, 1323, 1323, 1322, ...  
Resampling results across tuning parameters: 

  max_depth  colsample_bytree  Accuracy   Kappa     
  3          0.5               0.8737458  0.3802840 
  3          0.7               0.8734728  0.3845053 
  6          0.5               0.8730674  0.3840938 
  6          0.7               0.8732589  0.3920721 

Tuning parameter 'nrounds' was held constant at a value of 100 
Tuning parameter 'min_child_weight' was held constant at a value of 2 
Tuning parameter 'subsample' was held constant at a value of 0.5 
Accuracy was used to select the optimal model using the largest value. 
The final values used for the model were nrounds = 100, max_depth = 3, eta = 0.1, gamma = 1, colsample_bytree = 0.5, min_child_weight = 2 and subsample = 0.5.

Again, we observed that with XGBoost model, we have achieved an accuracy above 87%, which is a better accuracy compared to the 84% achieved with KNN.

Table of Contents for Building attrition prediction model with XGBoost

Create new playlist

Sign In

Sign Up

Table of Contents for
Building attrition prediction model with XGBoost