Problems

  1. What is the information entropy of the following multisets?
    a) {1,2}, b) {1,2,3}, c) {1,2,3,4}, d) {1,1,2,2}, e) {1,1,2,3}
  2. What is the information entropy of the probability space induced by the biased coin that shows heads with the probability 10% and tails with the probability 90%?
  3. Let us take another example of playing chess from Chapter 2, Naive Bayes:
  • a) What is the information gain for each of the non-classifying attributes in the table?
  • b) What is the decision tree constructed from the given table?
  • c) How would you classify a data sample (warm,strong,spring,?) according to the constructed decision tree?

Temperature

Wind

Season

Play

Cold

Strong

Winter

No

Warm

Strong

Autumn

No

Warm

None

Summer

Yes

Hot

None

Spring

No

Hot

Breeze

Autumn

Yes

Warm

Breeze

Spring

Yes

Cold

Breeze

Winter

No

Cold

None

Spring

Yes

Hot

Strong

Summer

Yes

Warm

None

Autumn

Yes

Warm

Strong

Spring

?

  1. Mary and temperature preferences. Let us take the example from the Chapter 1, Classification Using K Nearest Neighbors, about the temperature preferences of Mary.

Temperature in degrees Celsius

Wind speed in kmph

Mary's perception

10

0

Cold

25

0

Warm

15

5

Cold

20

3

Warm

18

7

Cold

20

10

Cold

22

5

Warm

24

6

Warm

We would like to use decision trees to decide if our friend Mary would feel warm or cold in the room with the temperature 16 degrees Celsius with the fan of the wind speed 3km/h.

Can you please explain how a decision tree algorithm could be used here and how good it would be to use it for this example?

Analysis:

  1. Here are entropies of the multisets:

a) E({1,2})=-(1/2)*log2(1/2)-(1/2)*log2(1/2)=1

b) E({1,2,3})=-(1/3)*log2(1/3)-(1/3)*log2(1/3)-(1/3)*log2(1/3)=1.5849625

c) E({1,2,3,4})=-(1/4)*log2(1/4)-(1/4)*log2(1/4)-(1/4)*log2(1/4)-(1/4)*log2(1/4)=2

d) E({1,1,2,2})=-(2/4)*log2(2/4)-(2/4)*log2(2/4)=1

e) E({1,1,2,3})=-(2/4)*log2(2/4)-(1/4)*log2(1/4)-(1/4)*log2(1/4)=1.5

Here note that the information entropy of the multisets that have more than two classes is greater than 1, so we need more than one bit of information to represent the result. But is this true for every multiset that has more than two classes of elements?

  1. E({10% of heads, 90% of tails})=-0.1*log2(0.1)-(0.9)*log2(0.9)=0.46899559358
  2. a) The information gains for the three attributes are as follows:

IG(S,temperature)=0.0954618442383

IG(S,wind)=0.0954618442383

IG(S,season)=0.419973094022

b) Therefore, we would choose the attribute season to branch from the root node as it has the highest information gain. Alternatively, we can put all the input data into the program to construct a decision tree:

    Root
    ├── [Season=Autumn]
    │    ├──[Wind=Breeze]
    │    │    └──[Play=Yes]
    │    ├──[Wind=Strong]
    │    │    └──[Play=No]
    │    └──[Wind=None]
    │    └──[Play=Yes]
    ├── [Season=Summer]
    │    ├──[Temperature=Hot]
    │    │    └──[Play=Yes]
    │    └──[Temperature=Warm]
    │    └──[Play=Yes]
    ├── [Season=Winter]
    │    └──[Play=No]
    └── [Season=Spring]
    ├── [Temperature=Hot]
    │    └──[Play=No]
    ├── [Temperature=Warm]
    │    └──[Play=Yes]
    └── [Temperature=Cold]
    └── [Play=Yes]
  

c) According to the constructed decision tree, we would classify the data sample (warm,strong,spring,?) to the class Play=Yes by going to the bottommost branch from the root node and then arriving to the leaf node by taking the middle branch.

  1. Here the decision tree algorithm may not perform that well without any processing of the data. If we considered every class of a temperature, then 25 degrees Celsius would still not occur in the decision tree as it is not in the input data, so we would not be able to classify how Mary would feel at 16 degrees Celsius and at 3km/h windspeed.

We could alternatively divide the temperature and wind speed into the intervals in order to reduce the classes, so that the resulting decision tree could classify the input instance. But it is this division, the classification of in what intervals 25 degrees Celsius and 3km/h should be, that is the fundamental part of the analysis procedure for this type of problem. Thus decision trees without any serious modification could not analyze the problem well.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset