Business and data understanding

We are are going to visit our old nemesis the Pima Diabetes data once again. It has proved to be quite a challenge with most classifiers producing accuracy rates in the mid-70s. We've looked at this data in Chapter 5, More Classification Techniques - K-Nearest Neighbors and Support Vector Machines and Chapter 6, Classification and Regression Trees so we can skip over the details. There are a number of R packages to build ensembles, and it is not that difficult to build your own code. In this iteration, we are going to attack the problem with the caret and caretEnsemble packages. Let's get the packages loaded and the data prepared, including creating the train and test sets using the createDataPartition() function from caret:

    > library(MASS)

    > library(caretEnsemble)

    > library(caTools)

    > pima <- rbind(Pima.tr, Pima.te)

    > set.seed(502)

    > split <- createDataPartition(y = pima$type, p = 0.75, list = F)

    > train <- pima[split, ]

    > test <- pima[-split, ]

Table of Contents for Business and data understanding

Create new playlist

Sign In

Sign Up

Table of Contents for
Business and data understanding