ID3 algorithm - decision tree construction

The ID3 algorithm constructs a decision tree from the data based on the information gain. In the beginning, we start with the set S. The data items in the set S have various properties according to which we can partition the set S. If an attribute A has the values {v1, ..., vn}, then we partition the set S into the sets S1, ..., Sn, where the set Si is a subset of the set S, where the elements have the value vi for the attribute A.

If each element in the set S has attributes A1, ..., Am, then we can partition the set S according to any of the possible attributes. The ID3 algorithm partitions the set S according to the attribute that yields the highest information gain. Now suppose that it was attribute A1. Then for the set S we have the partitions S1, ..., Sn, where A1 has the possible values {v1,..., vn}.

Since we have not constructed any tree yet, we first place a root node into the tree. For every partition of S, we place a new branch from the root. Every branch represents one value of the selected attributes. A branch has data samples with the same value for that attribute. For every new branch, we can define a new node that will have data samples from its ancestor branch.

Once we have defined a new node, we choose another of the remaining attributes with the highest information gain for the data at that node to partition the data at that node further, then define new branches and nodes. This process can be repeated until we run out of all the attributes for the nodes or even earlier until all the data at the node has the same class of our interest. In the case of the swim preference example, there are only two possible classes for the swimming preference: class no and class yes. The last node is called a leaf node and decides the class of a data item from the data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset