Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Business case

The overall business objective in this situation is to see if we can improve the predictive ability for some of the cases that we already worked on in the previous chapters. For regression, we will revisit the prostate cancer dataset from Chapter 4, Advanced Feature Selection in Linear Models. The baseline mean squared error to improve on is 0.444.

For classification purposes, we will utilize both the breast cancer biopsy data from Chapter 3, Logistic Regression and Discriminant Analysis and the Pima Indian Diabetes data from Chapter 5, More Classification Techniques — K-Nearest Neighbors and Support Vector Machines. In the breast cancer data, we achieved 97.6 percent predictive accuracy. For the diabetes data, we are seeking to improve on the 79.6 percent accuracy rate.

Both random forests and boosting will be applied to all three datasets. The simple tree method will only be used on the breast and prostate cancer sets from Chapter 4, Advanced Feature Selection in Linear Models.

Modeling and evaluation

To perform the modeling process, we will need to load seven different R packages. Then, we will go through each of the techniques and compare how well they perform on the data analyzed with the prior methods in the previous chapters.

Regression tree

We will jump right in to the prostate dataset, but let's first load the necessary R packages. As always, please ensure that you have the libraries installed prior to loading the packages:

> library(rpart) #classification and regression trees
> library(partykit) #treeplots
> library(MASS) #breast and pima indian data
> library(ElemStatLearn) #prostate data
> library(randomForest) #random forests
> library(gbm) #gradient boosting
> library(caret) #tune hyper-parameters

We will first do regression with the prostate data and prepare it as we did in Chapter 4, Advanced Feature Selection in Linear Models. This involves calling the dataset, coding the gleason score as an indicator variable using the ifelse() function, and creating the test and train sets. The train set will be pros.train and the test set will be pros.test, as follows:

> data(prostate)
> prostate$gleason = ifelse(prostate$gleason == 6, 0, 1)
> pros.train = subset(prostate, train==TRUE)[,1:9]
> pros.test = subset(prostate, train==FALSE)[,1:9]

To build a regression tree on the train data, we will use the rpart() function from R's party package. The syntax is quite similar to what we used in the other modeling techniques:

> tree.pros = rpart(lpsa~., data=pros.train)

We can call this object using the print() function and cptable and then examine the error per split in order to determine the optimal number of splits in the tree:

> print(tree.pros$cptable)
          CP nsplit rel error    xerror      xstd
1 0.35852251      0 1.0000000 1.0364016 0.1822698
2 0.12295687      1 0.6414775 0.8395071 0.1214181
3 0.11639953      2 0.5185206 0.7255295 0.1015424
4 0.05350873      3 0.4021211 0.7608289 0.1109777
5 0.01032838      4 0.3486124 0.6911426 0.1061507
6 0.01000000      5 0.3382840 0.7102030 0.1093327

This is a very important table to analyze. The first column labeled CP is the cost complexity parameter. The second column, nsplit, is the number of splits in the tree. The rel error column stands for relative error and is the RSS for the number of splits divided by the RSS for no splits RSS(k)/RSS(0). Both xerror and xstd are based on the ten-fold cross-validation with xerror being the average error and xstd the standard deviation of the cross-validation process. We can see that while five splits produced the lowest error on the full dataset, four splits produced a slightly less error using cross-validation. You can examine this using plotcp():

> plotcp(tree.pros)

The output of the preceding command is as follows:

The plot shows us the relative error by the tree size with the corresponding error bars. The horizontal line on the plot is the upper limit of the lowest standard error. Selecting a tree size, 5, which is four splits, we can build a new tree object where xerror is minimized by pruning our tree accordingly by first creating an object for cp associated with the pruned tree from the table. Then the prune() function handles the rest:

> cp = min(tree.pros$cptable[5,])

> prune.tree.pros = prune(tree.pros, cp = cp)

With this done, you can plot and compare the full and pruned trees. The tree plots produced by the partykit package are much better than those produced by the party package. You can simply use the as.party() function as a wrapper in plot():

> plot(as.party(tree.pros))

The output of the preceding command is as follows:

Now we will use the as.party() function for the pruned tree:

> plot(as.party(prune.tree.pros))

The output of the preceding command is as follows:

Note that the splits are exactly the same in the two trees with the exception of the last split, which includes the variable age for the full tree. Interestingly, both the first and second splits in the tree are related to the log of cancer volume (lcavol). These plots are quite informative as they show the splits, nodes, observations per node, and boxplots of the outcome that we are trying to predict.

Let's see how well the pruned tree performs on the test data. What we will do is create an object of the predicted values using the predict() function and incorporate the test data. Then, calculate the errors (the predicted values minus the actual values) and finally, the mean of the squared errors:

> party.pros.test = predict(prune.tree.pros, newdata=pros.test)

> rpart.resid = party.pros.test - pros.test$lpsa #calculate residuals

> mean(rpart.resid^2) #caluclate MSE
[1] 0.5267748

We have not improved on the predictive value from our work in Chapter 4, Advanced Feature Selection in Linear Models where the baseline MSE was 0.44. However, the technique is not without value. One can look at the tree plots that we produced and easily explain what the primary drivers behind the response are. As mentioned in the introduction, the trees are easy to interpret and explain, which may be more important than accuracy in many cases.

Classification tree

For the classification problem, we will prepare the breast cancer data in the same fashion as we did in Chapter 3, Logistic Regression and Discriminant Analysis. After loading the data, you will delete the patient ID, rename the features, eliminate the few missing values, and then create the train/test datasets in the following way:

> data(biopsy)

> biopsy = biopsy[,-1] #delete ID

> names(biopsy) = c("thick", "u.size", "u.shape", "adhsn", "s.size", "nucl", "chrom", "n.nuc", "mit", "class") #change the feature names

> biopsy.v2 = na.omit(biopsy) #delete the observations with missing values

> set.seed(123) #random number generator

> ind = sample(2, nrow(biopsy.v2), replace=TRUE, prob=c(0.7, 0.3))

> biop.train = biopsy.v2[ind==1,] #the training data set

> biop.test = biopsy.v2[ind==2,] #the test data set

With the dataset up appropriately, we will use the same syntax style for a classification problem as we did previously for a regression problem, but before creating a classification tree, we will need to ensure that the outcome is Factor, which can be done using the str() function:

> str(biop.test[,10])
 Factor w/ 2 levels "benign","malignant": 1 1 1 1 1 2 1 2 1 1 …

First, create the tree and then examine the table for the optimal number of splits:

> set.seed(123)

> tree.biop = rpart(class~., data=biop.train)

> print(tree.biop$cptable)
          CP nsplit rel error    xerror       xstd
1 0.79651163      0 1.0000000 1.0000000 0.06086254
2 0.07558140      1 0.2034884 0.2674419 0.03746996
3 0.01162791      2 0.1279070 0.1453488 0.02829278
4 0.01000000      3 0.1162791 0.1744186 0.03082013

The cross-validation error is at a minimum with only two splits (row 3). We can now prune the tree, plot the full and pruned trees, and see how it performs on the test set:

> cp = min(tree.biop$cptable[3,])

> prune.tree.biop = prune(tree.biop, cp = cp)

> plot(as.party(tree.biop))
> plot(as.party(prune.tree.biop))

The output of the preceding command is as follows:

Now we will plot the pruned trees using the following command:

> plot(as.party(prune.tree.biop))

The output of the preceding command is as follows:

An examination of the tree plots shows that the uniformity of the cell size is the first split, then nuclei. The full tree had an additional split at the cell thickness. We can predict the test observations using type="class" in the predict() function, as follows:

> rparty.test = predict(prune.tree.biop, newdata=biop.test, type="class")

> table(rparty.test, biop.test$class)
           
rparty.test benign malignant
  benign       136         3
  malignant      6        64

> (136+64)/209
[1] 0.9569378

The basic tree with just two splits gets us almost 96 percent accuracy. This still falls short of 97.6 percent with logistic regression but should encourage us to believe that we can improve on this with the upcoming methods, starting with random forests.

Random forest regression

In this section, we will start by focusing again on the prostate data before moving on to the breast cancer and Pima Indian sets. We will use the randomForest package. The general syntax to create a random forest object is to use the randomForest() function and specify the formula and dataset as the two primary arguments. Recall that, for regression, the default variable sample per tree iteration is p/3, and for classification, it is the square root of p, where p is equal to the number of predictor variables in the data frame. For larger datasets, in terms of p, you can tune the mtry parameter, which will determine the number of p sampled at each iteration. If p is less than 10 in these examples, we will forgo this procedure. When you want to optimize mtry for larger p datasets, you can utilize the caret package or use the tuneRF() function in randomForest. With this, let's build our forest and examine the results, as follows:

> set.seed(123)

> rf.pros = randomForest(lpsa~., data=pros.train)

> print(rf.pros)

Call:
 randomForest(formula = lpsa ~ ., data = pros.train)
               Type of random forest: regression
                     Number of trees: 500
No. of variables tried at each split: 2

          Mean of squared residuals: 0.6813944
                    % Var explained: 52.58

The call of the rf.pros object shows us that the random forest generated 500 different trees (the default) and sampled two variables at each split. The result is an MSE of 0.68 and nearly 53 percent of the variance explained. Let's see if we can improve on the default number of trees. Too many trees can lead to overfitting; naturally, how much is too many depends on the data. Two things can help out, the first one is a plot of rf.pros and the other is to ask for the minimum MSE:

> plot(rf.pros)

This plot shows the MSE by the number of trees in the model. You can see that as the trees are added, significant improvement in MSE occurs early on and then flatlines just before 100 trees are built in the forest.

We can identify the specific and optimal tree with the which.min() function, as follows:

> which.min(rf.pros$mse)
[1] 70

We can try 70 trees in the random forest by just specifying ntree=70 in the model syntax:

> set.seed(123)

> rf.pros.2 = randomForest(lpsa~., data=pros.train, ntree=70)

> print(rf.pros.2)

Call:
 randomForest(formula = lpsa ~ ., data = pros.train, ntree = 70)
               Type of random forest: regression
                     Number of trees: 70
No. of variables tried at each split: 2

          Mean of squared residuals: 0.6617529
                    % Var explained: 53.95

You can see that the MSE and variance explained have both improved slightly. Let's see one other plot before testing the model. If we are combining the results of 70 different trees that are built using bootstrapped samples and only two random predictors, we will need a way to determine the drivers of the outcome. Only one tree alone cannot be used to paint this picture but you can produce a variable importance plot and corresponding list. The y axis is a list of variables in descending order of importance and the x axis is the percentage of improvement in MSE. Note that for the classification problems, this will be an improvement in the Gini index. The function is varImpPlot():

> varImpPlot(rf.pros.2, main="Variable Importance Plot - PSA Score")

The output of the preceding command is as follows:

Consistent with the single tree, lcavol is the most important variable and lweight is the second-most important variable. If you want to examine the raw numbers, use the importance() function, as follows:

> importance(rf.pros.2)
        IncNodePurity
lcavol      25.446395
lweight     16.758646
age          7.191313
lbph         6.161000
svi          8.114879
lcp          7.580892
gleason      4.218471
pgg45        8.166068

Now, it is time to see how it did on the test data:

> rf.pros.test = predict(rf.pros.2, newdata=pros.test)

> rf.resid = rf.pros.test - pros.test$lpsa #calculate residual

> mean(rf.resid^2)
[1] 0.5420387

The MSE is still higher than our 0.44 that we achieved in Chapter 4, Advanced Feature Selection in Linear Models with LASSO and no better than just a single tree.

Random forest classification

Perhaps you are disappointed with the performance of the random forest regression model but the true power of the technique is in the classification problems. Let's get started with the breast cancer diagnosis data. The procedure is nearly the same as we did with the regression problem:

> set.seed(123)

> rf.biop = randomForest(class~., data=biop.train)

> print(rf.biop)

Call:
 randomForest(formula = class ~ ., data = biop.train)
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 3

        OOB estimate of  error rate: 3.16%
Confusion matrix:
          benign malignant class.error
benign       294         8  0.02649007
malignant      7       165  0.04069767

The OOB error rate is 3.16%. Again, this is with all the 500 trees factored into the analysis. Let's plot the Error by trees:

> plot(rf.biop)

The output of the preceding command is as follows:

The plot shows that the minimum error and standard error is the lowest with quite a few trees. Let's now pull the exact number using which.min() again. The one difference from before is that we need to specify column 1 to get the error rate. This is the overall error rate and there will be additional columns for each error rate by the class label. We will not need them in this example. Also, mse is no longer available but rather err.rate is used instead, as follows:

> which.min(rf.biop$err.rate[,1])
[1] 19

Only 19 trees are needed to optimize the model accuracy. Let's try this and see how it performs:

> rf.biop.2 = randomForest(class~., data=biop.train, ntree=19)

> print(rf.biop.2)

Call:
 randomForest(formula = class ~ ., data = biop.train, ntree = 19)
               Type of random forest: classification
                     Number of trees: 19
No. of variables tried at each split: 3

        OOB estimate of  error rate: 2.95%
Confusion matrix:
          benign malignant class.error
benign       294         8  0.02649007
malignant      6       166  0.03488372

> rf.biop.test = predict(rf.biop.2, newdata=biop.test, type="response")

> table(rf.biop.test, biop.test$class)
            
rf.biop.test benign malignant
   benign       139         0
   malignant      3        67

> (139+67)/209
[1] 0.9856459

Well, how about that? The train set error is below 3 percent and the model even performs better on the test set where we had only three observations misclassified out of 209 and none were false positives. Recall that the best so far was with logistic regression with 97.6 percent accuracy. So this seems to be our best performer yet on the breast cancer data. Before moving on, let's have a look at the variable importance plot:

> varImpPlot(rf.biop.2)

The output of the preceding command is as follows:

The importance in the preceding plot is each variable's contribution to the mean decrease in the Gini index. This is rather different from the splits of the single tree. Recall that the full tree had splits at the size (consistent with random forest), then nuclei, and then thickness. This shows how potentially powerful a technique building random forests can be, not only in the predictive ability, but also in feature selection.

Moving on to the tougher challenge of the Pima Indian diabetes model, we will first need to prepare the data in the following way:

> data(Pima.tr)

> data(Pima.te)

> pima = rbind(Pima.tr, Pima.te)

> set.seed(502)

> ind = sample(2, nrow(pima), replace=TRUE, prob=c(0.7,0.3))

> pima.train = pima[ind==1,]

> pima.test = pima[ind==2,]

Now, we will move on to the building of the model, as follows:

> set.seed(321)

> rf.pima = randomForest(type~., data=pima.train)

> print(rf.pima)

Call:
 randomForest(formula = type ~ ., data = pima.train)
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 2

        OOB estimate of  error rate: 20%
Confusion matrix:
     No Yes class.error
No  233  29   0.1106870
Yes  48  75   0.3902439

We get a 20 percent misclassification rate error, which is no better than what we've done before on the train set. Let's see if optimizing the tree size can improve things dramatically:

> which.min(rf.pima$err.rate[,1])
[1] 80

> set.seed(321)

> rf.pima.2 = randomForest(type~., data=pima.train, ntree=80)

> print(rf.pima.2)

Call:
 randomForest(formula = type ~ ., data = pima.train, ntree = 80)
               Type of random forest: classification
                     Number of trees: 80
No. of variables tried at each split: 2

        OOB estimate of  error rate: 19.48%
Confusion matrix:
     No Yes class.error
No  230  32   0.1221374
Yes  43  80   0.3495935

At 80 trees in the forest, there is minimal improvement in the OOB error. Can random forest live up to the hype on the test data? We will see in the following way:

> rf.pima.test = predict(rf.pima.2, newdata=pima.test, type="response")

> table(rf.pima.test, pima.test$type)
            
rf.pima.test No Yes
         No  75  21
         Yes 18  33

> (75+33)/147
[1] 0.7346939

Well, we get only 73 percent accuracy on the test data, which is inferior to what we achieved using the SVM.

While random forest disappointed on the diabetes data, it proved to be the best classifier so far for the breast cancer diagnosis. Finally, we will move on to gradient boosting.

Gradient boosting regression

The R package that we will use in this section is called gbm, which we have already loaded. As stated in the boosting overview, we will be tuning three boosting parameters: the number of trees, interaction depth, and shrinkage. As we did in the prior chapters, it will be helpful to call on the caret package to help in the tuning process. The default parameter in the package is 100 trees, interaction depth is 1, and shrinkage is 0.001.

Using the expand.grid() function, we will build our experimental grid to run through the training process of the caret package. Let's do the trees from 100 to 500 by = 200, interaction depth 1 to 4 by = 1, and shrinkage as 0.001, 0.01, and 0.1:

> grid = expand.grid(.n.trees=seq(100,500, by=200), .interaction.depth=seq(1,4, by=1), .shrinkage=c(.001,.01,.1),
 .n.minobsinnode=10)

This creates a grid of 36 different models that the caret package will run so as to determine the best tuning parameters. A note of caution is in order. On a dataset of the size that we will be working with, this process takes only a few seconds. However, in large datasets, this can take hours. As such, you must apply your judgment and experiment with smaller samples of the data in order to identify the tuning parameters in case time is of the essence or you are constrained by the size of your hard drive.

Before using the train() function from the caret package, I would like to specify the trainControl argument by creating an object called control. This object will store the method that we want so as to train the tuning parameters. For the prostate data, we will use LOOCV, which is LOOCV, as follows:

> control = trainControl(method="LOOCV")

To utilize the train() function, just specify the formula as we did with the other models: the train dataset, method, train control, and experimental grid. Remember to set the random seed!

> gbm.pros.train = train(lpsa~., data=pros.train, method="gbm", trControl=control, tuneGrid=grid)

Calling the object gives us the optimal parameters and the results of each of the parameter settings, as follows;

> gbm.pros.train
Stochastic Gradient Boosting

67 samples
 8 predictor

No pre-processing
Resampling:

Summary of sample sizes: 66, 66, 66, 66, 66, 66,..

Resampling results across tuning parameters:

  n.trees  interaction.depth  shrinkage  RMSE   Rsquared
  100      1                  0.001      1.184  0.132   
  100      1                  0.010      0.989  0.441   
  100      1                  0.100      0.914  0.431   
  …..................................................
  500      4                  0.010      0.857  0.490   
  500      4                  0.100      1.022  0.357   

RMSE was used to select the optimal model using  the smallest value.
The final values used for the model were n.trees = 300, interaction.depth = 3 and shrinkage = 0.01.

Having identified the tuning parameters, just place this in the gbm() function to build the model in the train set. Additionally, we will specify the distribution = "gaussian" for a squared error as this is a continuous outcome:

> gbm.pros = gbm(lpsa~., data=pros.train, n.trees=300, interaction.depth=3, shrinkage=0.01, distribution="gaussian")

With an object created from the train data, we can see how it performs on the test data using the predict() function as we did with the other methods. Then calculate the residuals and MSE:

> gbm.pros.test = predict(gbm.pros, newdata=pros.test, n.trees=300)

> gbm.resid = gbm.pros.test - pros.test$lpsa

> mean(gbm.resid^2)
[1] 0.6208321

An MSE of 0.62 on the test set does not improve our predictive ability at all. Let's take a look at the plot of the Predicted versus Actuals values. As it turns out, gradient boosting can be very susceptible to a high variance in the presence of outliers. The plot shows that the model performed very well except for two observations. The key takeaway for this dataset is that it is probably too small to provide any benefit of using the boosting methods in terms of the number of observations. Boosting may become beneficial in a dataset of this n-size in a situation of high dimensionality:

> plot(gbm.pros.test, pros.test$lpsa, main="Predicted versus Actuals")

The output of the preceding command is as follows:

Gradient boosting classification

For the breast cancer problem, we will again use the train() function from the caret package. The only change to the syntax is that we will use 10-fold cross-validation as trainControl:

> control = trainControl(method="CV", number=10)

> set.seed(123)

> gbm.biop.train = train(class~., data=biop.train, method="gbm", trControl=control, tuneGrid=grid)

> gbm.biop.train
Stochastic Gradient Boosting

474 samples
  9 predictor
  2 classes: 'benign', 'malignant'

No pre-processing
Resampling: Cross-Validated (10 fold)
…....................................
Accuracy was used to select the optimal model using  the largest value.  The final values used for the model were n.trees = 100, interaction.depth = 1 and shrinkage = 0.1.

We will again put these tuning parameters into the gbm() function. However, we will have to modify our dataset slightly. The function will not accept a factor for a 0/1 classification problem. This is a quick fix as we will use the ifelse() function to code benign as 0 and malignant as 1:

> biop.train$class = ifelse(biop.train$class=="benign",0,1)

This gives us the 0/1 class labels that we will need in our response variable for gbm(). Remember that for regression, we used the gaussian distribution. For a 0/1 label problem that we have here, let's use bernoulli, which is a logistic loss function. The Bernoulli distribution where the random variable takes a value of one for success with a probability p and a failure value of zero with a probability q = 1 – p. Other distributions are available, but I'll refer you to the package's documentation for other alternatives.

> gbm.biop = gbm(class~., distribution="bernoulli", data=biop.train, n.trees=100, interaction.depth=1, shrinkage=0.1)

For the prediction of the test set values, you will again need to change how you analyze the results versus regression. The predict() function in the package will provide the probabilities of the class membership, just as logistic regression did in Chapter 3, Logistic Regression and Discriminant Analysis. In other words, anything less than 50 percent predicted probability is benign, otherwise it is malignant. We will again use the ifelse() function to handle this properly:

> gbm.biop.test = predict(gbm.biop, newdata=biop.test, type="response", n.trees=100)

> gbm.class = ifelse(gbm.biop.test <0.5,"benign", "malignant")

> table(gbm.class, biop.test$class)
           
gbm.class   benign malignant
  benign       140         2
  malignant      2        65

> (140+65)/209
[1] 0.9808612

This model performed quite well, only misclassifying four of the 209 observations, only one more than the random forest model.

We will now move on to the the final challenge of the chapter, the diabetes data, which has proved such a challenge to the improvement of predictive power. The process for this problem is no different than what we just produced with the breast cancer diagnosis data, as follows:

> set.seed(123)

> gbm.pima.train = train(type~., data=pima.train, method="gbm", trControl=control, tuneGrid=grid)

> gbm.pima.train
Stochastic Gradient Boosting

385 samples
  7 predictor
  2 classes: 'No', 'Yes'

No pre-processing
Resampling: Cross-Validated (10 fold)
…....................................
Accuracy was used to select the optimal model using the largest value.  The final values used for the model were n.trees = 500, interaction.depth = 3 and shrinkage = 0.01.

> pima.train$type = ifelse(pima.train$type=="No",0,1)

> gbm.pima = gbm(type~., distribution="bernoulli", data=pima.train, n.trees=500,interaction.depth=3,shrinkage=0.01)

> gbm.pima.test = predict(gbm.pima, newdata=pima.test, type="response", n.trees=500)

> gbm.type = ifelse(gbm.pima.test <0.5,"No", "Yes")

> table(gbm.type, pima.test$type)

gbm.type No Yes
     No  77  22
     Yes 16  32

> (77+32)/147
[1] 0.7414966

The accuracy at 74 percent does not beat the benchmark that we attained with the SVM, meaning that none of the tree-based methods improved the predictive ability on the Pima Indian data. Before closing this section, I would like to point out that you can also produce the variable importance with boosted trees, similar to what we did with random forests. The summary() function in the gbm package produces a table of the relative influence values and a barplot as well:

> summary(gbm.pima)
        var   rel.inf
glu     glu 44.152295
age     age 16.019096
bmi     bmi 13.849614
ped     ped 10.545554
npreg npreg  6.144122
skin   skin  4.908916
bp       bp  4.380403

The output of the preceding command is as follows:

As the influence is relative, we can conclude that the glu variable (glucose levels) account for 44 percent of the variance in the predicted outcome. These influence/importance plots provide an effective tool in feature selection and in presenting the results to the business partners.

Model selection

Recall that our primary objective in this chapter was to use the tree-based methods to improve the predictive ability of the work done in the prior chapters. What did we learn? First, on the prostate data with a quantitative response, we were not able to improve on the linear models that we produced in Chapter 4, Advanced Feature Selection in Linear Models. Second, the random forest and boosted trees both outperformed logistic regression on the Wisconsin Breast Cancer data of Chapter 3, Logistic Regression and Discriminant Analysis. Finally, and I must say disappointingly, we were not able to improve on the SVM model on the Pima Indian diabetes data.

As a result, we can feel comfortable that we have good models for the prostate and breast cancer problems. We will try one more time to improve the model for diabetes in Chapter 7, Neural Networks by introducing the Deep learning section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Business case

Create new playlist

Sign In

Sign Up

Business case

Modeling and evaluation

Regression tree

Classification tree

Random forest regression

Random forest classification

Gradient boosting regression

Gradient boosting classification

Model selection

Table of Contents for
Business case