Swim preference - information gain calculation

Let us calculate the information gain for the six rows in the swim preference example by taking swimming suit as an attribute. Because we are interested whether a given row of data is classified as no or yes for the question whether one should swim, we will use the swim preference to calculate the entropy and information gain. We partition the set S by the attribute swimming suit:

Snone={(none,cold,no),(none,warm,no)}

Ssmall={(small,cold,no),(small,warm,no)}

Sgood={(good,cold,no),(good,warm,yes)}

The information entropy of S is E(S)=-(1/6)*log2(1/6)-(5/6)*log2(5/6)~0.65002242164.

The information entropy of the partitions is:

E(Snone)=-(2/2)*log2(2/2)=-log2(1)=0 since all instances have the class no.

E(Ssmall)=0 for a similar reason.

E(Sgood)=-(1/2)*log2(1/2)=1

Therefore, the information gain is:

IG(S,swimming suit)=E(S)-[(2/6)*E(Snone)+(2/6)*E(Ssmall)+(2/6)*E(Sgood)]

=0.65002242164-(1/3)=0.3166890883

If we chose the attribute water temperature to partition the set S, what would be the information gain IG(S,water temperature)? The water temperature partitions the set S into the following sets:

Scold={(none,cold,no),(small,cold,no),(good,cold,no)}

Swarm={(none,warm,no),(small,warm,no),(good,warm,yes)}

Their entropies are:

E(Scold)=0 as all instances are classified as no.

E(Swarm)=-(2/3)*log2(2/3)-(1/3)*log2(1/3)~0.91829583405

Therefore, the information gain from partitioning the set S by the attribute water temperature is:

IG(S,water temperature)=E(S)-[(1/2)*E(Scold)+(1/2)*E(Swarm)]

= 0.65002242164-0.5*0.91829583405=0.19087450461

This is less than IG(S,swimming suit). Therefore, we can gain more information about the set S (the classification of its instances) by partitioning it per the attribute swimming suit instead of the attribute water temperature. This finding will be the basis of the ID3 algorithm constructing a decision tree in the next section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset