Creating an ensemble

Using the functionality of mlr again, we first need to create an object with our base learners. This is once again classif.randomForest and, for a MARS model, we call the earth package with classif.earth:

> base <- c("classif.randomForest", "classif.earth")

You now make a learner with those base learners, and then specify that you want the output of those learners as the predicted probability:

> learns <- lapply(base, makeLearner)

> learns <- lapply(learns, setPredictType, "prob")

The process of building the base learning object is complete. I stated earlier that the ensembling learning algorithm will be GLM from glmnet. For just two base learners, a CART might be more appropriate, but let's demonstrate what's possible. There are a number of methods for stacking. In the following code block, I stack with cross-validation:

> sl <-
    mlr::makeStackedLearner(
    base.learners = learns,
    super.learner = "classif.glmnet",
    predict.type = "prob",
    method = "stack.cv"
 )

Now, it gets exciting as we train our stacked model:

stacked_fit <- mlr::train(sl, dna_task)

And we establish the predicted probabilities for the test data:

> pred_stacked <- predict(stacked_fit, newdata = test)

Just for a sanity check, let's look at the confusion matrix:

> mlr::calculateConfusionMatrix(pred_stacked)
        predicted
true      ei  ie   n -err.-
  ei     144   4   5      9
  ie       5 146   2      7
   n       2   1 327      3
  -err.-   7   5   7     19

The stacked model produced six fewer classification errors. The proof is in the metrics:

> mlr::performance(pred_stacked, measures = list(acc, logloss))
      acc   logloss 
0.9701258 0.1101400

Of course, accuracy is better, but even better the log-loss improved substantially.

What have we learned? Using primarily one package, mlr, we built a good model with random forest, but by stacking random forest and MARS, we improved performance. Although all of that was with just a few lines of code, it's important to understand how to create and implement the pipeline.

Table of Contents for Creating an ensemble

Create new playlist

Sign In

Sign Up

Table of Contents for
Creating an ensemble