Getting ready

We will take the same dataset that we used for training our AdaBoost model. In this example, we will see how we can train our model using gradient boosting machines. We will also look at a handful of hyperparameters that can be tuned to improve the model's performance.

First, we must import all the required libraries:

import os
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.ensemble import GradientBoostingClassifier 
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score, roc_auc_score
from sklearn.preprocessing import MinMaxScaler

import matplotlib.pyplot as plt
import itertools

Then, we read our data and label encode our target variables to 1 and 0:

# Read the Dataset
df_breastcancer = pd.read_csv("breastcancer.csv")

from sklearn.preprocessing import LabelEncoder
lb = LabelEncoder()
df_breastcancer['diagnosis'] = lb.fit_transform(df_breastcancer['diagnosis']) 
df_breastcancer.head(5)

Then, separate our target and feature variables. We split our data into train and test subsets:

# create feature & response variables
# drop the response var and id column as it'll not make any sense to the analysis
X = df_breastcancer.iloc[:,2:31]

# Target variable
Y = df_breastcancer.iloc[:,0]

# Create train & test sets
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.20, random_state=0, stratify= Y)

This is the same code that we used in the Getting ready section of the AdaBoost example.

Table of Contents for Getting ready

Create new playlist

Sign In

Sign Up

Table of Contents for
Getting ready