How it works...

In Step 1, we looked at the dimensions of our dataset. In Step 2, we took a glimpse at the datatypes of the variables and noticed that all our variables were numeric in nature. In Step 3, we dropped the ID column since it is of no use for our exercise. We skipped looking at the correlations between the variables, but it is recommended that the reader adds this step in order to fully understand and analyze the data.

In Step 4, we moved on to check whether we had any missing values in our dataset. We noticed that our dataset had no missing values in this case. In Step 5, we separated the predictor and response variable and also split our dataset into a training dataset, which was 70% of the data, and a testing dataset, which was 30% of the data. In Step 6, we used StandardScaler() from sklearn.metrics to standardize our predictor variables in both the training and testing datasets.

After that, in Step 7, we used SGDClassifier() from sklearn.linear_model to build our logistic regression model using the stochastic gradient descent method. We set our hyperparameters, such as alpha, loss, max_iter, and penalty. We set loss='log' in order to use the SGDClassifier for logistic regression. We used predict_proba() to predict the probabilities for our test observations, which provided us with the probabilities of both classes for all the test observations.

With loss set to hinge, SGDClassifier() provides a linear SVM. (We will cover SVMs in the upcoming section). The loss can be set to other values, such as squared_hinge, which is the same as hinge but is quadratically penalized.

In Steps 8 and 9, we filtered out the probabilities for class 1 and looked at our model score. In Steps 10 and 11, we looked at the AUC value and plotted our ROC curve. We will explore more about hyperparameter tuning for each technique in upcoming sections.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset