To build our model with q decision tree algorithm, we will use the backorders.csv file, which can be downloaded from the following GitHub.
This dataset has 23 columns. The target variable is went_on_backorder. This identifies whether a product has gone on back order. The other 22 variables are the predictor variables. A description of the data is provided in the code that comes with this book:
We will start by importing the required libraries:
# import os for operating system dependent functionalities
import os
# import other required libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix, roc_curve, auc
import itertools
from sklearn import tree
import seaborn as sns
import matplotlib.pyplot as plt
We set our working directory with the os.chdir() command:
# Set your working directory according to your requirement
os.chdir(".../Chapter 4/Decision Tree")
# Check Working Directory
os.getcwd()
Let's read our data. As we have done previously, we are going to prefix the name of the DataFrame with df_ to make it easier to understand:
df_backorder = pd.read_csv("BackOrders.csv")