We take the problem from the previous chapter. We have the following data about the shopping preferences of our friend, Jane:
Temperature |
Rain |
Shopping |
Cold |
None |
Yes |
Warm |
None |
No |
Cold |
Strong |
Yes |
Cold |
None |
No |
Warm |
Strong |
No |
Warm |
None |
Yes |
Cold |
None |
? |
In the previous chapter, decision trees were not able to classify the feature (Cold, None). So, this time, we would like to find, using the random forest algorithm, whether Jane would go shopping if the outside temperature was cold and there was no rain.
Analysis:
To perform the analysis with the random forest algorithm we use the implemented program.
Input:
We put the data from the table into the CSV file:
# source_code/4/shopping.csv Temperature,Rain,Shopping Cold,None,Yes Warm,None,No Cold,Strong,Yes Cold,None,No Warm,Strong,No Warm,None,Yes Cold,None,?
Output:
We want to use a slightly larger number of the trees that we used in the previous examples and explanations to get more accurate results. We want to construct a random forest with 20 trees with the output of the low verbosity - level 0. Thus, we execute in a terminal:
$ python random_forest.py shopping.csv 20 0 ***Classification*** Feature: ['Cold', 'None', '?'] Tree 0 votes for the class: Yes Tree 1 votes for the class: No Tree 2 votes for the class: No Tree 3 votes for the class: No Tree 4 votes for the class: No Tree 5 votes for the class: Yes Tree 6 votes for the class: Yes Tree 7 votes for the class: Yes Tree 8 votes for the class: No Tree 9 votes for the class: Yes Tree 10 votes for the class: Yes Tree 11 votes for the class: Yes Tree 12 votes for the class: Yes Tree 13 votes for the class: Yes Tree 14 votes for the class: Yes Tree 15 votes for the class: Yes Tree 16 votes for the class: Yes Tree 17 votes for the class: No Tree 18 votes for the class: No Tree 19 votes for the class: No The class with the maximum number of votes is 'Yes'. Thus the constructed random forest classifies the feature ['Cold', 'None', '?'] into the class 'Yes'.
However, we should note that only 12 out of the 20 trees voted for the answer Yes. Thus just as an ordinary decision tree could not decide the case, so here, although having a definite answer, it may not be so certain. But unlike in decision trees where an answer was not produced because of data inconsistency, here we have an answer.
Furthermore, by measuring the strength of the voting power for each individual class, we can measure the level of the confidence that the answer is correct. In this case the feature ['Cold', 'None', '?'] belongs to the class Yes with the confidence of 12/20 or 60%. To determine the level of certainty of the classification more precisely, even a larger ensemble of random decision trees would be required.