Implementation of random forest algorithm

We implement a random forest algorithm using a modified decision tree algorithm from the previous chapter. We also add an option to set a verbose mode within the program that can describe the whole process of how the algorithm works on a specific input- how a random forest is constructed with its random decision trees and how this constructed random forest is used to classify other features.

The implementation of a random forest uses the construction of a decision tree from the previous chapter. A reader is encouraged to consult the function decision_tree.construct_general_tree from the previous chapter:

# source_code/4/random_forest.py
import math
import random
import sys
sys.path.append('../common')
import common # noqa
import decision_tree # noqa
from common import printfv # noqa

#Random forest construction
def sample_with_replacement(population, size):
sample = []
for i in range(0, size):
sample.append(population[random.randint(0, len(population) - 1)])
return sample

def construct_random_forest(verbose, heading, complete_data,
enquired_column, m, tree_count):
printfv(2, verbose, "*** Random Forest construction *** ")
printfv(2, verbose, "We construct a random forest that will " +
"consist of %d random decision trees. ", tree_count)
random_forest = []
for i in range(0, tree_count):
printfv(2, verbose, " Construction of a random " +
"decision tree number %d: ", i)
random_forest.append(construct_random_decision_tree(
verbose, heading, complete_data, enquired_column, m))
printfv(2, verbose, " Therefore we have completed the " +
"construction of the random forest consisting of %d " +
"random decision trees. ", tree_count)
return random_forest

def construct_random_decision_tree(verbose, heading, complete_data,
enquired_column, m):
sample = sample_with_replacement(complete_data, len(complete_data))
printfv(2, verbose, "We are given %d features as the input data. " +
"Out of these, we choose randomly %d features with the " +
"replacement that we will use for the construction of " +
"this particular random decision tree: " +
str(sample) + " ", len(complete_data),
len(complete_data))
# The function construct_general_tree from the module decision_tree
# is written in the implementation section in the previous chapter
# on decision trees.
return decision_tree.construct_general_tree(verbose, heading,
sample,
enquired_column, m)

# M is the given number of the decision variables, i.e. properties
# of one feature.
def choose_m(verbose, M):
m = int(min(M, math.ceil(2 * math.sqrt(M))))
printfv(2, verbose, "We are given M=" + str(M) +
" variables according to which a feature can be " +
"classified. ")
printfv(3, verbose, "In random forest algorithm we usually do " +
"not use all " + str(M) + " variables to form tree " +
"branches at each node. ")
printfv(3, verbose, "We use only m variables out of M. ")
printfv(3, verbose, "So we choose m such that m is less than or " +
"equal to M. ")
printfv(3, verbose, "The greater m is, a stronger classifier an " +
"individual tree constructed is. However, it is more " +
"susceptible to a bias as more of the data is considered. " +
"Since we in the end use multiple trees, even if each may " +
"be a weak classifier, their combined classification " +
"accuracy is strong. Therefore as we want to reduce a " +
"bias in a random forest, we may want to consider to " +
"choose a parameter m to be slightly less than M. ")
printfv(2, verbose, "Thus we choose the maximum number of the " +
"variables considered at the node to be " +
"m=min(M,math.ceil(2*math.sqrt(M)))" +
"=min(M,math.ceil(2*math.sqrt(%d)))=%d. ", M, m)
return m

#Classification
def display_classification(verbose, random_forest, heading,
enquired_column, incomplete_data):
if len(incomplete_data) == 0:
printfv(0, verbose, "No data to classify. ")
else:
for incomplete_feature in incomplete_data:
printfv(0, verbose, " Feature: " +
str(incomplete_feature) + " ")
display_classification_for_feature(
verbose, random_forest, heading,
enquired_column, incomplete_feature)

def display_classification_for_feature(verbose, random_forest, heading,
enquired_column, feature):
classification = {}
for i in range(0, len(random_forest)):
group = decision_tree.classify_by_tree(
random_forest[i], heading, enquired_column, feature)
common.dic_inc(classification, group)
printfv(0, verbose, "Tree " + str(i) +
" votes for the class: " + str(group) + " ")
printfv(0, verbose, "The class with the maximum number of votes " +
"is '" + str(common.dic_key_max_count(classification)) +
"'. Thus the constructed random forest classifies the " +
"feature " + str(feature) + " into the class '" +
str(common.dic_key_max_count(classification)) + "'. ")

# Program start
csv_file_name = sys.argv[1]
tree_count = int(sys.argv[2])
verbose = int(sys.argv[3])

(heading, complete_data, incomplete_data,
enquired_column) = common.csv_file_to_ordered_data(csv_file_name)
m = choose_m(verbose, len(heading))
random_forest = construct_random_forest(
verbose, heading, complete_data, enquired_column, m, tree_count)
display_forest(verbose, random_forest)
display_classification(verbose, random_forest, heading,
enquired_column, incomplete_data)

Input:

As an input file to the implemented algorithm we provide the data from example Swim preference.

# source_code/4/swim.csv
swimming_suit,water_temperature,swim
None,Cold,No
None,Warm,No
Small,Cold,No
Small,Warm,No
Good,Cold,No
Good,Warm,Yes
Good,Cold,?

Output:

We type the following command in the command line to get the output:

$ python random_forest.py swim.csv 2 3 > swim.out

2 means that we would like to construct 2 decision trees and 3 is the level of the verbosity of the program which includes detailed explanations of the construction of the random forest, the classification of the feature and the graph of the random forest. The last part > swim.out means that the output is written to the file swim.out. This file can be found in the chapter directory source_code/4. This output of the program was used above to write the analysis of Swim preference problem.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset