Going shopping - overcoming data inconsistency with randomness and measuring the level of confidence

We take the problem from the previous chapter. We have the following data about the shopping preferences of our friend, Jane:

Temperature	Rain	Shopping
Cold	None	Yes
Warm	None	No
Cold	Strong	Yes
Cold	None	No
Warm	Strong	No
Warm	None	Yes
Cold	None	?

In the previous chapter, decision trees were not able to classify the feature (Cold, None). So, this time, we would like to find, using the random forest algorithm, whether Jane would go shopping if the outside temperature was cold and there was no rain.

Analysis:

To perform the analysis with the random forest algorithm we use the implemented program.

Input:

We put the data from the table into the CSV file:

# source_code/4/shopping.csv  
Temperature,Rain,Shopping  
Cold,None,Yes  
Warm,None,No  
Cold,Strong,Yes  
Cold,None,No  
Warm,Strong,No  
Warm,None,Yes 
Cold,None,?

Output:

We want to use a slightly larger number of the trees that we used in the previous examples and explanations to get more accurate results. We want to construct a random forest with 20 trees with the output of the low verbosity - level 0. Thus, we execute in a terminal:

$ python random_forest.py shopping.csv 20 0
***Classification***
Feature: ['Cold', 'None', '?'] 
Tree 0 votes for the class: Yes 
Tree 1 votes for the class: No 
Tree 2 votes for the class: No 
Tree 3 votes for the class: No 
Tree 4 votes for the class: No 
Tree 5 votes for the class: Yes 
Tree 6 votes for the class: Yes 
Tree 7 votes for the class: Yes 
Tree 8 votes for the class: No 
Tree 9 votes for the class: Yes 
Tree 10 votes for the class: Yes 
Tree 11 votes for the class: Yes 
Tree 12 votes for the class: Yes 
Tree 13 votes for the class: Yes 
Tree 14 votes for the class: Yes 
Tree 15 votes for the class: Yes 
Tree 16 votes for the class: Yes 
Tree 17 votes for the class: No 
Tree 18 votes for the class: No 
Tree 19 votes for the class: No
The class with the maximum number of votes is 'Yes'. Thus the constructed random forest classifies the feature ['Cold', 'None', '?'] into the class 'Yes'.

However, we should note that only 12 out of the 20 trees voted for the answer Yes. Thus just as an ordinary decision tree could not decide the case, so here, although having a definite answer, it may not be so certain. But unlike in decision trees where an answer was not produced because of data inconsistency, here we have an answer.

Furthermore, by measuring the strength of the voting power for each individual class, we can measure the level of the confidence that the answer is correct. In this case the feature ['Cold', 'None', '?'] belongs to the class Yes with the confidence of 12/20 or 60%. To determine the level of certainty of the classification more precisely, even a larger ensemble of random decision trees would be required.

Table of Contents for Going shopping - overcoming data inconsistency with randomness and measuring the level of confidence

Create new playlist

Sign In

Sign Up

Table of Contents for
Going shopping - overcoming data inconsistency with randomness and measuring the level of confidence