ID3 algorithm - decision tree construction

The ID3 algorithm constructs a decision tree from the data based on the information gain. In the beginning, we start with the set S. The data items in the set S have various properties according to which we can partition the set S. If an attribute A has the values {v₁, ..., v_n}, then we partition the set S into the sets S₁, ..., S_n, where the set S_i is a subset of the set S, where the elements have the value v_i for the attribute A.

If each element in the set S has attributes A₁, ..., A_m, then we can partition the set S according to any of the possible attributes. The ID3 algorithm partitions the set S according to the attribute that yields the highest information gain. Now suppose that it was attribute A₁. Then for the set S we have the partitions S₁, ..., S_n, where A₁ has the possible values {v₁,..., v_n}.

Since we have not constructed any tree yet, we first place a root node into the tree. For every partition of S, we place a new branch from the root. Every branch represents one value of the selected attributes. A branch has data samples with the same value for that attribute. For every new branch, we can define a new node that will have data samples from its ancestor branch.

Once we have defined a new node, we choose another of the remaining attributes with the highest information gain for the data at that node to partition the data at that node further, then define new branches and nodes. This process can be repeated until we run out of all the attributes for the nodes or even earlier until all the data at the node has the same class of our interest. In the case of the swim preference example, there are only two possible classes for the swimming preference: class no and class yes. The last node is called a leaf node and decides the class of a data item from the data.

Table of Contents for ID3 algorithm - decision tree construction

Create new playlist

Sign In

Sign Up

Table of Contents for
ID3 algorithm - decision tree construction