SVM modeling

We will use the e1071 package to build our SVM models. We will start with a linear support vector classifier and then move on to the nonlinear versions. The e1071 package has a nice function for SVM called tune.svm(), which assists in the selection of the tuning parameters/kernel functions. The tune.svm() function from the package uses cross-validation to optimize the tuning parameters. Let's create an object called linear.tune and call it using the summary() function, as follows:

    > linear.tune <- tune.svm(type ~ ., data = train,
      kernel = "linear",
      cost = c(0.001, 0.01, 0.1, 1, 5, 10))
    > summary(linear.tune)
    Parameter tuning of 'svm':
    - sampling method: 10-fold cross validation
    - best parameters:
     cost
        1
    - best performance: 0.2051957
    - Detailed performance results:
       cost     error dispersion
    1 1e-03 0.3197031 0.06367203
    2 1e-02 0.2080297 0.07964313
    3 1e-01 0.2077598 0.07084088
    4 1e+00 0.2051957 0.06933229
    5 5e+00 0.2078273 0.07221619
    6 1e+01 0.2078273 0.07221619

The optimal cost function is one for this data and leads to a misclassification error of roughly 21 per cent. We can make predictions on the test data and examine that as well using the predict() function and applying newdata = test:

    > best.linear <- linear.tune$best.model
    > tune.test <- predict(best.linear, newdata = test)
    > table(tune.test, test$type)
    tune.test No Yes
          No  82  22
          Yes 13  30 
    > (82 + 30)/147
    [1] 0.7619048

The linear support vector classifier has slightly outperformed KNN on both the train and test sets. The e1071 package has a nice function for SVM called tune.svm() that assists in the selection of the tuning parameters/kernel functions. We will now see if nonlinear methods will improve our performance and also use cross-validation to select tuning parameters.

The first kernel function that we will try is polynomial, and we will be tuning two parameters: a degree of polynomial (degree) and kernel coefficient (coef0). The polynomial order will be 3, 4, and 5 and the coefficient will be in increments from 0.1 to 4, as follows:

    > set.seed(123) 
    > poly.tune <- tune.svm(type ~ ., data = train,
      kernel = "polynomial",
      degree = c(3, 4, 5),
      coef0 = c(0.1, 0.5, 1, 2, 3, 4)) 
    > summary(poly.tune)
    Parameter tuning of 'svm': 
    - sampling method: 10-fold cross validation
    - best parameters:
     degree coef0
          3   0.1
    - best performance: 0.2310391

The model has selected degree of 3 for the polynomial and coefficient of 0.1. Just as the linear SVM, we can create predictions on the test set with these parameters, as follows:

    > best.poly <- poly.tune$best.model
    > poly.test <- predict(best.poly, newdata = test)
    > table(poly.test, test$type)
    poly.test No Yes
          No  81  28
          Yes 12  26
    > (81 + 26) / 147
    [1] 0.7278912

This did not perform quite as well as the linear model. We will now run the radial basis function. In this instance, the one parameter that we will solve for is gamma, which we will examine in increments of 0.1 to 4. If gamma is too small, the model will not capture the complexity of the decision boundary; if it is too large, the model will severely overfit:

    > set.seed(123)
    > rbf.tune <- tune.svm(type ~ ., data = train, 
      kernel = "radial", 
      gamma = c(0.1, 0.5, 1, 2, 3, 4))
    > summary(rbf.tune)
    Parameter tuning of 'svm':
    - sampling method: 10-fold cross validation
    - best parameters:
     gamma
       0.5
    - best performance: 0.2284076

The best gamma value is 0.5, and the performance at this setting does not seem to improve much over the other SVM models. We will check for the test set as well in the following way:

    > best.rbf <- rbf.tune$best.model
    > rbf.test <- predict(best.rbf, newdata = test)
    > table(rbf.test, test$type)
    rbf.test No Yes
         No  73  33
         Yes 20  21
    > (73+21)/147
    [1] 0.6394558

The performance is downright abysmal. One last shot to improve here would be with kernel = "sigmoid". We will be solving for two parameters-- gamma and the kernel coefficient (coef0):

    > set.seed(123)
    > sigmoid.tune <- tune.svm(type ~ ., data = train,
      kernel = "sigmoid",
      gamma = c(0.1, 0.5, 1, 2, 3, 4),
      coef0 = c(0.1, 0.5, 1, 2, 3, 4)) 
    > summary(sigmoid.tune)
    Parameter tuning of 'svm':
    - sampling method: 10-fold cross validation
    - best parameters:
     gamma coef0
       0.1     2
    - best performance: 0.2080972

This error rate is in line with the linear model. It is now just a matter of whether it performs better on the test set or not:

    > best.sigmoid <- sigmoid.tune$best.model
    > sigmoid.test <- predict(best.sigmoid, newdata = test)
    > table(sigmoid.test, test$type)
    sigmoid.test No Yes
             No  82  19
             Yes 11  35
    > (82+35)/147
    [1] 0.7959184

Lo and behold! We finally have a test performance that is in line with the performance on the train data. It appears that we can choose the sigmoid kernel as the best predictor.

So far we've played around with different models. Now, let's evaluate their performance along with the linear model using metrics other than just the accuracy.

Table of Contents for SVM modeling

Create new playlist

Sign In

Sign Up

Table of Contents for
SVM modeling