Problems

What is the information entropy of the following multisets?
a) {1,2}, b) {1,2,3}, c) {1,2,3,4}, d) {1,1,2,2}, e) {1,1,2,3}
What is the information entropy of the probability space induced by the biased coin that shows heads with the probability 10% and tails with the probability 90%?
Let us take another example of playing chess from Chapter 2, Naive Bayes:

a) What is the information gain for each of the non-classifying attributes in the table?
b) What is the decision tree constructed from the given table?
c) How would you classify a data sample (warm,strong,spring,?) according to the constructed decision tree?

Temperature	Wind	Season	Play
Cold	Strong	Winter	No
Warm	Strong	Autumn	No
Warm	None	Summer	Yes
Hot	None	Spring	No
Hot	Breeze	Autumn	Yes
Warm	Breeze	Spring	Yes
Cold	Breeze	Winter	No
Cold	None	Spring	Yes
Hot	Strong	Summer	Yes
Warm	None	Autumn	Yes
Warm	Strong	Spring	?

Mary and temperature preferences. Let us take the example from the Chapter 1, Classification Using K Nearest Neighbors, about the temperature preferences of Mary.

Temperature in degrees Celsius	Wind speed in kmph	Mary's perception
10	0	Cold
25	0	Warm
15	5	Cold
20	3	Warm
18	7	Cold
20	10	Cold
22	5	Warm
24	6	Warm

We would like to use decision trees to decide if our friend Mary would feel warm or cold in the room with the temperature 16 degrees Celsius with the fan of the wind speed 3km/h.

Can you please explain how a decision tree algorithm could be used here and how good it would be to use it for this example?

Analysis:

Here are entropies of the multisets:

a) E({1,2})=-(1/2)*log₂(1/2)-(1/2)*log₂(1/2)=1

b) E({1,2,3})=-(1/3)*log₂(1/3)-(1/3)*log₂(1/3)-(1/3)*log₂(1/3)=1.5849625

c) E({1,2,3,4})=-(1/4)*log₂(1/4)-(1/4)*log₂(1/4)-(1/4)*log₂(1/4)-(1/4)*log₂(1/4)=2

d) E({1,1,2,2})=-(2/4)*log₂(2/4)-(2/4)*log₂(2/4)=1

e) E({1,1,2,3})=-(2/4)*log₂(2/4)-(1/4)*log₂(1/4)-(1/4)*log₂(1/4)=1.5

Here note that the information entropy of the multisets that have more than two classes is greater than 1, so we need more than one bit of information to represent the result. But is this true for every multiset that has more than two classes of elements?

E({10% of heads, 90% of tails})=-0.1*log₂(0.1)-(0.9)*log₂(0.9)=0.46899559358
a) The information gains for the three attributes are as follows:

IG(S,temperature)=0.0954618442383

IG(S,wind)=0.0954618442383

IG(S,season)=0.419973094022

b) Therefore, we would choose the attribute season to branch from the root node as it has the highest information gain. Alternatively, we can put all the input data into the program to construct a decision tree:

    Root
    ├── [Season=Autumn]
    │    ├──[Wind=Breeze]
    │    │    └──[Play=Yes]
    │    ├──[Wind=Strong]
    │    │    └──[Play=No]
    │    └──[Wind=None]
    │    └──[Play=Yes]
    ├── [Season=Summer]
    │    ├──[Temperature=Hot]
    │    │    └──[Play=Yes]
    │    └──[Temperature=Warm]
    │    └──[Play=Yes]
    ├── [Season=Winter]
    │    └──[Play=No]
    └── [Season=Spring]
    ├── [Temperature=Hot]
    │    └──[Play=No]
    ├── [Temperature=Warm]
    │    └──[Play=Yes]
    └── [Temperature=Cold]
    └── [Play=Yes]

c) According to the constructed decision tree, we would classify the data sample (warm,strong,spring,?) to the class Play=Yes by going to the bottommost branch from the root node and then arriving to the leaf node by taking the middle branch.

Here the decision tree algorithm may not perform that well without any processing of the data. If we considered every class of a temperature, then 25 degrees Celsius would still not occur in the decision tree as it is not in the input data, so we would not be able to classify how Mary would feel at 16 degrees Celsius and at 3km/h windspeed.

We could alternatively divide the temperature and wind speed into the intervals in order to reduce the classes, so that the resulting decision tree could classify the input instance. But it is this division, the classification of in what intervals 25 degrees Celsius and 3km/h should be, that is the fundamental part of the analysis procedure for this type of problem. Thus decision trees without any serious modification could not analyze the problem well.

Table of Contents for Problems

Create new playlist

Sign In

Sign Up

Table of Contents for
Problems