Business and data understanding

We are are going to visit our old nemesis the Pima Diabetes data once again. It has proved to be quite a challenge with most classifiers producing accuracy rates in the mid-70s. We've looked at this data in Chapter 5More Classification Techniques - K-Nearest Neighbors and Support Vector Machines and Chapter 6, Classification and Regression Trees so we can skip over the details. There are a number of R packages to build ensembles, and it is not that difficult to build your own code. In this iteration, we are going to attack the problem with the caret and caretEnsemble packages.  Let's get the packages loaded and the data prepared, including creating the train and test sets using the createDataPartition() function from caret:

    > library(MASS)

> library(caretEnsemble)

> library(caTools)

> pima <- rbind(Pima.tr, Pima.te)

> set.seed(502)

> split <- createDataPartition(y = pima$type, p = 0.75, list = F)

> train <- pima[split, ]

> test <- pima[-split, ]
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset