Whenever someone has to take tennis lessons in the morning, the night before, the instructor checks the weather report and decides whether the next morning would be good to play tennis. This recipe will use this as an example to build a decision tree.
Let's decide on the features of weather that affect the decision whether to play tennis in the morning or not:
- Rain
- Wind speed
- Temperature
Let's build a table using different combinations of these features:
Rain | Windy | Temperature | Play tennis? |
Yes | Yes | Hot | No |
Yes | Yes | Normal | No |
Yes | Yes | Cool | No |
No | Yes | Hot | No |
No | Yes | Cool | No |
No | No | Hot | Yes |
No | No | Normal | Yes |
No | No | Cool | No |
Now how do we build a decision tree? We can start with one of the three features: rain, wind speed, or temperature. The rule is to start in such a way that maximum information gain would be possible.
On a rainy day, as you can see in the table, other features do not matter and there is no play. The same is true for high wind velocity.
Decision trees, like most other algorithms, take feature values only as double values. So let's do the mapping:
The positive class is 1.0 and the negative class is 0.0. Let's load the data using the CSV format, with the first value as a label:
$ vi tennis.csv
0.0,1.0,1.0,2.0
0.0,1.0,1.0,1.0
0.0,1.0,1.0,0.0
0.0,0.0,1.0,2.0
0.0,0.0,1.0,0.0
1.0,0.0,0.0,2.0
1.0,0.0,0.0,1.0
0.0,0.0,0.0,0.0