Support vector machine

If you recall from a previous section, the first thing we did was perform RFE to reduce our input features. We'll repeat that step in the following. We'll redo our control function:

> ctrl <- caret::rfeControl(
    functions = caret::lrFuncs,
    method = "cv",
    number = 10,
    verbose = TRUE
 )

I say we shoot for around 20 to 30 total features and set our random seed:

> subsets <- c(20:30)

> set.seed(54321)

Now, in selecting the features you can use the SVM linear or the kernel functions. Let's proceed with linear, which means our specification for the following method will be svmLinear. If, for instance, you wanted to change to a polynomial kernel, then you would specify svmPoly instead or svmRadial for the radial basis function:

> svmProfile <- caret::rfe(
    train_df,
    train_y,
    sizes = subsets,
    rfeControl = ctrl,
    method = "svmLinear",
    metric = "Kappa"
 )

> svmProfile
Recursive feature selection
Outer resampling method: Cross-Validated (10 fold) 
Resampling performance over subset size:
 Variables Accuracy Kappa  AccuracySD KappaSD Selected
        20   0.8357 0.5206   0.008253 0.02915 
        21   0.8350 0.5178   0.008624 0.03091 
        22   0.8359 0.5204   0.008277 0.02948 
        23   0.8361 0.5220   0.009435 0.02979 
        24   0.8383 0.5292   0.008560 0.02572 *
        25   0.8375 0.5261   0.008067 0.02323 
        26   0.8379 0.5290   0.010193 0.02905 
        27   0.8375 0.5276   0.009205 0.02667 
        28   0.8372 0.5259   0.008770 0.02437 
        29   0.8361 0.5231   0.008074 0.02319 
        30   0.8368 0.5252   0.008069 0.02401 
        39   0.8377 0.5290   0.009290 0.02711 

The top 5 variables (out of 24):
   V74, V35, V22, V78, V20

The optimal Kappa and accuracy are with 24 features. Notice that the top five features are the same as when we ran this with KNN. Here's how to plot the Kappa score per number of features:

> svm_results <- svmProfile$results

> ggplot2::ggplot(svm_results, aes(Variables, Kappa)) +
    ggplot2::geom_line(color = 'steelblue', size = 2) +
    ggthemes::theme_fivethirtyeight()

The output of the preceding code is as follows:

Let's select a dataframe with only the optimal features:

> svm_vars <- svmProfile$optVariables

> x_selected <-
    train_df[, (colnames(train_df) %in% svm_vars)]

With our features selected, we can train a model with cross-validation, and in the process tune the hyperparameter, C. If you recall from previously, this is the regularization parameter. We'll go forward with caret's train() function:

> grid <- expand.grid(.C = c(1, 2, 3))

> svm_control <- caret::trainControl(method = 'cv', number = 10)

> set.seed(1918)

> svm <- caret::train(x_selected,
    train_y,
    method = "svmLinear",
    trControl = svm_control,
    tuneGrid = grid,
    metric = "Kappa")

> svm
Support Vector Machines with Linear Kernel 

4491 samples
  24 predictor
   2 classes: '0', '1' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 4041, 4042, 4042, 4041, 4042, 4043, ... 
Resampling results across tuning parameters:

  C Accuracy Kappa 
  1 0.8372287 0.5223355
  2 0.8367833 0.5210972
  3 0.8374514 0.5229846

Kappa was used to select the optimal model using the
 largest value.
The final value used for the model was C = 3.

Excellent! We have optimal C = 3, so let's build that model. By the way, be sure to specify we want a probability model with prob.model = TRUE. The linear kernel is specified with vanilladot:

> svm_fit <-
    kernlab::ksvm(
    as.matrix(x_selected),
    train_y,
    kernel = "vanilladot",
    prob.model = TRUE,
    kpar = "automatic",
    C = 3
 )

Do we want a dataframe of predicted probabilities on the train data? I'm glad you asked:

> svm_pred_train <-
    kernlab::predict(svm_fit, x_selected, type = "probabilities")

> svm_pred_train <- data.frame(svm_pred_train)

Our density plot in the following looks about as good as what we saw with KNN:

> classifierplots::density_plot(train_y, svm_pred_train$X1)

The output of the preceding code is as follows:

Two things before moving on to the test data, and that is AUC and the optimal score cutoff:

> Metrics::auc(train_y, svm_pred_train$X1)
[1] 0.8940114

> InformationValue::optimalCutoff(train_y, svm_pred_train$X1)
[1] 0.3879227

OK, the AUC is inferior to KNN on the training data, but the proof must be in our test data:

> test_svm <- test[, (colnames(test) %in% svm_vars)]

> svm_pred_test <-
    kernlab::predict(svm_fit, test_svm, type = "probabilities")

> svm_pred_test <- as.data.frame(svm_pred_test)

I insist we take a look at the density plot:

> classifierplots::density_plot(test_y, svm_pred_test$`1`)

The output of the preceding code is as follows:

I would put forward that we have a good overall fit here:

> Metrics::auc(test_y, svm_pred_test$`1`)
[1] 0.8951011

That's more like it: excellent bias/variance tradeoff. We can start the overall comparison with KNN by moving forward with the confusion matrix and relevant stats:

> svm_pred_class <- as.factor(ifelse(svm_pred_test$`1` >= 0.275, "1", "0"))

> caret::confusionMatrix(data = svm_pred_class, reference = test_y, positive = "1")
Confusion Matrix and Statistics
          Reference
Prediction    0   1
         0 1206 104
         1  247 366
                                             
               Accuracy : 0.8175 
                 95% CI : (0.7995, 0.8345) 
    No Information Rate : 0.7556 
    P-Value [Acc > NIR] : 0.00000000004314737
                                             
                  Kappa : 0.5519 
 Mcnemar's Test P-Value : 0.00000000000003472
                                             
            Sensitivity : 0.7787 
            Specificity : 0.8300 
         Pos Pred Value : 0.5971 
         Neg Pred Value : 0.9206 
             Prevalence : 0.2444 
         Detection Rate : 0.1903 
   Detection Prevalence : 0.3188 
      Balanced Accuracy : 0.8044 
                                             
       'Positive' Class : 1

When you compare the results across methods, we see better values for the SVM almost across the board, especially a better Kappa as well as better balanced accuracy. In the past couple of chapters, we've produced ROC plots where the various models were overlaid on the same plot. We can recreate that same plot here as well, as follows:

> pred.knn <- ROCR::prediction(knn_pred_test$X1, test_y)

> perf.knn <- ROCR::performance(pred.knn, "tpr", "fpr") 

> ROCR::plot(perf.knn, main = "ROC", col = 1) 

> pred.svm <- ROCR::prediction(svm_pred_test$`1`, test_y)

> perf.svm <- ROCR::performance(pred.svm, "tpr", "fpr") 

> ROCR::plot(perf.svm, col = 2, add = TRUE) 

> legend(0.6, 0.6, c("KNN", "SVM"), 1:2)

The output of the preceding code is as follows:

The plot shows a clear separation in the curves between the two models. Therefore, given what we've done here, the SVM algorithm performed better than KNN. Indeed, we could try a number of different methods to improve either algorithm, which could include a different feature selection and a different weighting for KNN (or kernels for SVM).

Table of Contents for Support vector machine

Create new playlist

Sign In

Sign Up

Table of Contents for
Support vector machine