Swim preference - analysis with random forest

We will use the example from the previous chapter about the swim preference. We have the same data table:

Swimming suit	Water temperature	Swim preference
None	Cold	No
None	Warm	No
Small	Cold	No
Small	Warm	No
Good	Cold	No
Good	Warm	Yes

We would like to construct a random forest from this data and use it to classify an item (Good,Cold,?).

Analysis:

We are given M=3 variables according to which a feature can be classified. In a random forest algorithm, we usually do not use all three variables to form tree branches at each node. We use only m variables out of M. So we choose m such that m is less than or equal to M. The greater m is, the stronger the classifier is in each constructed tree. However, as mentioned earlier, more data leads to more bias. But, because we use multiple trees (with smaller m), even if each constructed tree is a weak classifier, their combined classification accuracy is strong. As we want to reduce a bias in a random forest, we may want to consider to choose a parameter m to be slightly less than M.

Thus we choose the maximum number of the variables considered at the node to be m=min(M,math.ceil(2*math.sqrt(M)))=min(M,math.ceil(2*math.sqrt(3)))=3.

We are given the following features:

[['None', 'Cold', 'No'], ['None', 'Warm', 'No'], ['Small', 'Cold', 'No'], ['Small', 'Warm', 'No'], ['Good', 'Cold', 'No'], ['Good', 'Warm', 'Yes']]

When constructing a random decision tree as a part of a random forest, we will choose only a subset out of them in a random manner with replacement.

Table of Contents for Swim preference - analysis with random forest

Create new playlist

Sign In

Sign Up

Table of Contents for
Swim preference - analysis with random forest