Growing Rotational Forest

Random forests and Bagging give impressive results with very large ensembles; having a large number of estimators results in an improvement in the accuracy of these methods. On the contrary, a Rotational forest is designed to work with a smaller number of ensembles.

Let us write down the steps involved in building a Rotational Forest. The number of trees required in the forest is typically specified by the user. Let T be the number of trees required to be built.

We start with iterating from 1 through T, that is, we build T trees.

For each tree t, perform the following steps:

  • Split the attributes in the training set into K non-overlapping subsets of equal size.
  • We have K datasets, each with K attributes. For each of the K datasets, we proceed to do the following: Bootstrap 75 percent of the data from each K dataset, and use the bootstrapped sample for further steps:
    • Run a Principal Component analysis on the ith subset in K. Retain all the principal components. For every feature j in the Kth subset, we have a principal component a. Let us denote it as aij, which is the principal component for the jth attribute in the ith subset.
    • Store the principal components for the subset.
  • Create a rotation matrix of size n X n, where n is the total number of attributes. Arrange the principal components in the matrix such that the components match the position of the features in the original training dataset.
  • Project the training dataset on the Rotation matrix using matrix multiplication.
  • Build a decision tree with the projected dataset.
  • Store the tree and the rotational matrix.

With this knowledge, let us jump to our recipe.

Getting ready…

We are going to generate some classification datasets to demonstrate a Rotational Forest. To our knowledge, there is no Python implementation available for Rotational forests. Hence, we will write our own code. We will leverage Scikit Learn's implementation of a Decision Tree Classifier and use the train_test_split method for bootstrapping.

How to do it...

We will start with loading all the necessary libraries. Let us leverage the make_classification method from the sklearn.dataset module to generate the training data. We follow it with a method to select a random subset of attributes called gen_random_subset:

from sklearn.datasets import make_classification
from sklearn.metrics import classification_report
from sklearn.cross_validation import train_test_split
from sklearn.decomposition import PCA
from sklearn.tree import DecisionTreeClassifier
import numpy as np

def get_data():
    """
    Make a sample classification dataset
    Returns : Independent variable y, dependent variable x
    """
    no_features = 50
    redundant_features = int(0.1*no_features)
    informative_features = int(0.6*no_features)
    repeated_features = int(0.1*no_features)
    x,y = make_classification(n_samples=500,n_features=no_features,flip_y=0.03,
            n_informative = informative_features, n_redundant = redundant_features 
            ,n_repeated = repeated_features,random_state=7)
    return x,y

def get_random_subset(iterable,k):
    subsets = []
    iteration = 0
    np.random.shuffle(iterable)
    subset = 0
    limit = len(iterable)/k
    while iteration < limit:
        if k <= len(iterable):
            subset = k
        else:
            subset = len(iterable)
        subsets.append(iterable[-subset:])
        del iterable[-subset:]
        iteration+=1
    return subsets

We now write a function build_rotationtree_model, where we will build fully grown trees, and proceed to evaluate the forest's performance using the function model_worth:

def build_rotationtree_model(x_train,y_train,d,k):
    models = []
    r_matrices = []
    feature_subsets = []
    for i in range(d):
        x,_,_,_ = train_test_split(x_train,y_train,test_size=0.3,random_state=7)
        # Features ids
        feature_index = range(x.shape[1])
        # Get subsets of features
        random_k_subset = get_random_subset(feature_index,k)
        feature_subsets.append(random_k_subset)
        # Rotation matrix
        R_matrix = np.zeros((x.shape[1],x.shape[1]),dtype=float)
        for each_subset in random_k_subset:
            pca = PCA()
            x_subset = x[:,each_subset]
            pca.fit(x_subset)
            for ii in range(0,len(pca.components_)):
                for jj in range(0,len(pca.components_)):
                    R_matrix[each_subset[ii],each_subset[jj]] = pca.components_[ii,jj]
                
        x_transformed = x_train.dot(R_matrix)
        
        model = DecisionTreeClassifier()
        model.fit(x_transformed,y_train)
        models.append(model)
        r_matrices.append(R_matrix)
    return models,r_matrices,feature_subsets
    
def model_worth(models,r_matrices,x,y):
    
    predicted_ys = []
    for i,model in enumerate(models):
        x_mod =  x.dot(r_matrices[i])
        predicted_y = model.predict(x_mod)
        predicted_ys.append(predicted_y)
    
    predicted_matrix = np.asmatrix(predicted_ys)
    final_prediction = []
    for i in range(len(y)):
        pred_from_all_models = np.ravel(predicted_matrix[:,i])
        non_zero_pred = np.nonzero(pred_from_all_models)[0]  
        is_one = len(non_zero_pred) > len(models)/2
        final_prediction.append(is_one)
    
    print classification_report(y, final_prediction)

Finally, we write a main function for invoking the functions that we have defined earlier:

if __name__ == "__main__":
    x,y = get_data()    
#    plot_data(x,y)

    # Divide the data into Train, dev and test    
    x_train,x_test_all,y_train,y_test_all = train_test_split(x,y,test_size = 0.3,random_state=9)
    x_dev,x_test,y_dev,y_test = train_test_split(x_test_all,y_test_all,test_size=0.3,random_state=9)
        
    # Build a bag of models
    models,r_matrices,features = build_rotationtree_model(x_train,y_train,25,5)
    model_worth(models,r_matrices,x_train,y_train)
    model_worth(models,r_matrices,x_dev,y_dev)

How it works…

Let us start with our main function. We invoke get_data to get our predictor attributes in the response attributes. Inside get_data, we leverage the make_classification dataset to generate the training data for our recipe as follows:

def get_data():
    """
    Make a sample classification dataset
    Returns : Independent variable y, dependent variable x
    """
    no_features = 30
    redundant_features = int(0.1*no_features)
    informative_features = int(0.6*no_features)
    repeated_features = int(0.1*no_features)
    x,y = make_classification(n_samples=500,n_features=no_features,flip_y=0.03,
            n_informative = informative_features, n_redundant = redundant_features 
            ,n_repeated = repeated_features,random_state=7)
    return x,y

Let us look at the parameters passed to the make_classification method. The first parameter is the number of instances required; in this case we say we need 500 instances. The second parameter is about the number of attributes required per instance. We say that we need 30. The third parameter, flip_y, randomly interchanges 3 percent of the instances. This is done to introduce some noise in our data. The next parameter is about the number of features out of those 30 features, which should be informative enough to be used in our classification. We have specified that 60 percent of our features, that is, 18 out of 30 should be informative. The next parameter is about redundant features. These are generated as a linear combination of the informative features in order to introduce a correlation among the features. Finally, repeated features are the duplicate features which are drawn randomly from both informative features and redundant features.

Let us split the data into the training and testing set using train_test_split. We reserve 30 percent of our data for testing:

    # Divide the data into Train, dev and test    
    x_train,x_test_all,y_train,y_test_all = train_test_split(x,y,test_size = 0.3,random_state=9)

Once again, we leverage train_test_split to split our test data into dev and test as follows:

    x_dev,x_test,y_dev,y_test = train_test_split(x_test_all,y_test_all,test_size=0.3,random_state=9)

With the data divided for building, evaluating, and testing the model, we proceed to build our models:

    models,r_matrices,features = build_rotationtree_model(x_train,y_train,25,5)

We invoke the build_rotationtree_model function to build our Rotational forest. We pass our training data, predictors x_train and response variable y_train, the total number of trees to be built (25 in this case), and finally, the subset of features to be used (5 in this case).

Let us jump to that function:

    models = []
    r_matrices = []
    feature_subsets = []

We begin with declaring three lists to store each of the decision tree, the rotation matrix for that tree, and finally, the subset of features used in that iteration. We proceed to build each tree in our ensemble.

As a first order of business, we bootstrap to retain only 75 percent of the data:

        x,_,_,_ = train_test_split(x_train,y_train,test_size=0.3,random_state=7)

We leverage the train_test_split function from Scikit learn for bootstrapping. We then decide the feature subsets as follows:

        # Features ids
        feature_index = range(x.shape[1])
        # Get subsets of features
        random_k_subset = get_random_subset(feature_index,k)
        feature_subsets.append(random_k_subset)

The function get_random_subset takes the feature index and the number of subsets that require k as parameter, and returns K subsets.

Inside that function, we shuffle the feature index. The feature index is an array of numbers that starts from 0 and ends with the number of features in our training set:

    np.random.shuffle(iterable)

Let us say we have 10 features and our k value is 5 indicating that we need subsets with 5 non-overlapping feature indices; we need to then do two iterations. We store the number of iterations needed in the limit variable:

    limit = len(iterable)/k
    while iteration < limit:
        if k <= len(iterable):
            subset = k
        else:
            subset = len(iterable)
        iteration+=1

If our required subset is less than the total number of attributes, then we can proceed to use the first k entries in our iterable. Since we have shuffled our iterables, we will be returning different volumes at different times:

        subsets.append(iterable[-subset:])

On selecting a subset, we remove it from the iterable as we need non-overlapping sets:

        del iterable[-subset:]

With all the subsets ready, we declare our rotation matrix as follows:

        # Rotation matrix
        R_matrix = np.zeros((x.shape[1],x.shape[1]),dtype=float)

As you can see, our rotational matrix is of size n x n, where n is the number of attributes in our dataset. You can see that we have used the shape attribute to declare this matrix filled with zeros:

        for each_subset in random_k_subset:
            pca = PCA()
            x_subset = x[:,each_subset]
            pca.fit(x_subset)

For each of the K subsets of data having only K features, we proceed to perform the principal component analysis.

We fill our rotational matrix with the component values as follows:

            for ii in range(0,len(pca.components_)):
                for jj in range(0,len(pca.components_)):
                    R_matrix[each_subset[ii],each_subset[jj]] = pca.components_[ii,jj]

For example, let us say that we have three attributes in our subset, in a total of six attributes. For illustration, let us say our subsets are:

2,4,6 and 1,3,5

Our rotational matrix R is of size 6 x 6. Assume that we want to fill the rotation matrix for the first subset of features. We will have three principal components, one each for 2, 4, and 6 of size 1 x 3.

The output of the PCA from Scikit learn is a matrix of the size component's X features. We go through each component value in the for loop. At the first run, our feature of interest is 2, and the cell (0,0) in the component matrix output from PCA gives the value of the contribution of feature 2 to component 1. We have to find the right place in the rotational matrix for this value. We use the index from the component matrix ii and jj with the subset list to get the right place in the rotation matrix:

                    R_matrix[each_subset[ii],each_subset[jj]] = pca.components_[ii,jj]

each_subset[0] and each_subset[0] will put us in cell (2,2) in the rotation matrix. As we go through the loop, the next component value in cell (0,1) in the component matrix will be placed in cell (2,4) of the rotational matrix, and the last one in cell (2,6) of the rotational matrix. This is done for all the attributes in the first subset. Let us go to the second subset; here the first attribute is 1. Cell (0,0) of the component matrix corresponds to cell (1,1) in the rotation matrix.

Proceeding this way, you will notice that the attribute component values are arranged in the same order as the attributes themselves.

With our rotation matrix ready, let us project our input onto the rotation matrix:

        x_transformed = x_train.dot(R_matrix)

It's time now to fit our decision tree:

model = DecisionTreeClassifier()
model.fit(x_transformed,y_train)

Finally, we store our models and the corresponding rotation matrices:

models.append(model)
r_matrices.append(R_matrix)

With our model built, let us proceed to see how good our model is with both the train and the dev data, using the model_worth function:

model_worth(models,r_matrices,x_train,y_train)
model_worth(models,r_matrices,x_dev,y_dev)

Let us take a look at our model_worth function:

    for i, model in enumerate(models):
        x_mod =  x.dot(r_matrices[i])
        predicted_y = model.predict(x_mod)
        predicted_ys.append(predicted_y)

Inside the function with perform prediction using each of the tree we have built. However, before the prediction, we project our input using the rotation matrix. We store all our prediction output in a list called predicted_ys. Let us say we have 100 instances to predict, and we have 10 models in our tree. For each instance, we have 10 predictions. We store those as a matrix for convenience:

predicted_matrix = np.asmatrix(predicted_ys)

Now we proceed to give a final classification for each of our input records:

    final_prediction = []
    for i in range(len(y)):
        pred_from_all_models = np.ravel(predicted_matrix[:,i])
        non_zero_pred = np.nonzero(pred_from_all_models)[0]  
        is_one = len(non_zero_pred) > len(models)/2
        final_prediction.append(is_one)

We will store our final prediction in a list called final_prediction. We go through each of the predictions for our instance. Say we are in the first instance (i=0 in our for loop); pred_from_all_models stores the output from all the trees in our model. It's an array of 0s and 1s indicating the class which has the model classified at that instance.

We make another array out of it non_zero_pred which has only those entries from the parent arrays which are non-zero.

Finally, if the length of this non-zero array is greater than half the number of models that we have, we say our final prediction is 1 for the instance of interest. What we have accomplished here is the classic voting scheme.

Let us look at how good our models are now by calling classification_report:

    print classification_report(y, final_prediction)

The following is the performance of our model on the training set:

How it works…

Let us look at our model's performance on the dev dataset:

How it works…

There's more…

More information about Rotational forests can be gathered from the following paper:

Rotation Forest: A New Classifier Ensemble Method, Juan J. Rodriguez, Member, IEEE Computer Society, Ludmila I. Kuncheva, Member, IEEE, and Carlos J. Alonso

The paper also claims that when Extremely Randomized Trees was compared to Bagging, AdBoost, and Random Forest on 33 datasets, Extremely Randomized Trees outperformed all the other three algorithms.

Similar to Gradient Boosting, the authors of the paper claim that the Extremely Randomized method is an overall framework, and the underlying ensemble does not necessarily have to be a Decision Tree. Work is in progress on testing other algorithms like Naïve Bayes, Neural Networks, and others.

See also

  • Extracting Principal Components recipe in Chapter 4, Analyzing Data - Deep Dive
  • Reducing data dimension by Random Projection recipe in Chapter 4, Analyzing Data - Deep Dive
  • Building Decision Trees to solve Multi Class Problems recipe in Chapter 6, Machine Learning I
  • Understanding Ensemble, Gradient Boosting recipe in Chapter 8, Model Selection and Evaluation
  • Growing from trees to Forest, Random Forest recipe in Chapter 9, Machine Learning III
  • Growing Extremely Randomized Trees recipe in Chapter 9, Machine Learning III
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset