Overview of random forest construction

We will describe how each tree is constructed in a random fashion. Given N training features, for the construction of each decision tree, we provide the data by selecting N features randomly with replacement from the initial data for the random forest. This process of selecting the data randomly with replacement for each tree is called bootstrap aggregating or tree bagging. The purpose of bootstrap aggregating is to reduce the variance and bias in the results of the classification.

Say a feature has M variables that are used to classify the feature using the decision tree. When we must make a branching decision at a node, in the ID3 algorithm we choose the variable that resulted in the highest information gain. Here in a random decision tree at each node, we consider only at most m (which is at most M) variables (we do not consider the ones that were already chosen) sampled in a random fashion without the replacement from the given M variables. Then out of these m variables, we choose the one that results in the highest information gain.

The rest of the construction of a random decision tree is carried out just as it was for a decision tree in the previous chapter.

Table of Contents for Overview of random forest construction

Create new playlist

Sign In

Sign Up

Table of Contents for
Overview of random forest construction