In the previous section, we implemented an orientation baseline, so let's focus on heavy machinery. We will follow the approach taken by the KDD Cup 2009 winning solution developed by the IBM Research team (Niculescu-Mizil and others, 2009).
Their strategy to address the challenge was using the Ensemble Selection algorithm (Caruana and Niculescu-Mizil, 2004). This is an ensemble method, which means it constructs a series of models and combines their output in a specific way to provide the final classification. It has several desirable properties as shown in the following list that make it a good fit for this challenge:
In this section, we will loosely follow the steps as described in their report. Note, this is not an exact implementation of their approach, but rather a solution overview that will include the necessary steps to dive deeper.
The general overview of steps is as follows:
For this task, we will need an additional Weka package, ensembleLibrary
. Weka 3.7.2 or higher versions support external packages developed mainly by the academic community. A list of WEKA Packages is available at http://weka.sourceforge.net/packageMetaData as shown at the following screenshot:
Find and download the latest available version of the ensembleLibrary
package at http://prdownloads.sourceforge.net/weka/ensembleLibrary1.0.5.zip?download.
After you unzip the package, locate ensembleLibrary.jar
and import it to your code, as follows:
import weka.classifiers.meta.EnsembleSelection;
First, we will utilize Weka's built-in weka.filters.unsupervised.attribute.RemoveUseless
filter, which works exactly as its name suggests. It removes the attributes that do not vary much, for instance, all constant attributes are removed, and attributes that vary too much, almost at random. The maximum variance, which is applied only to nominal attributes, is specified with the –M
parameter. The default parameter is 99%, which means that if more than 99% of all instances have unique attribute values, the attribute is removed, as follows:
RemoveUseless removeUseless = new RemoveUseless(); removeUseless.setOptions(new String[] { "-M", "99" });// threshold removeUseless.setInputFormat(data); data = Filter.useFilter(data, removeUseless);
Next, we will replace all the missing values in the dataset with the modes (nominal attributes) and means (numeric attributes) from the training data by using the weka.filters.unsupervised.attribute.ReplaceMissingValues
filter. In general, missing values replacement should be proceeded with caution while taking into consideration the meaning and context of the attributes:
ReplaceMissingValues fixMissing = new ReplaceMissingValues(); fixMissing.setInputFormat(data); data = Filter.useFilter(data, fixMissing);
Finally, we will discretize numeric attributes, that is, we transform numeric attributes into intervals using the weka.filters.unsupervised.attribute.Discretize
filter. With the –B
option, we set to split numeric attributes into four intervals, and the –R
option will specify the range of attributes (only numeric attributes will be discretized):
Discretize discretizeNumeric = new Discretize(); discretizeNumeric.setOptions(new String[] { "-B", "4", // no of bins "-R", "first-last"}); //range of attributes fixMissing.setInputFormat(data); data = Filter.useFilter(data, fixMissing);
In the next step, we will select only informative attributes, that is, attributes that more likely help with prediction. A standard approach to this problem is to check the information gain carried by each attribute. We will use the weka.attributeSelection.AttributeSelection
filter, which requires two additional methods: evaluator, that is, how attribute usefulness is calculated, and search algorithms, that is, how to select a subset of attributes.
In our case, we first initialize weka.attributeSelection.InfoGainAttributeEval
that implements calculation of information gain:
InfoGainAttributeEval eval = new InfoGainAttributeEval(); Ranker search = new Ranker();
To select only top attributes above some threshold, we initialize weka.attributeSelection.Ranker
to rank the attributes with information gain above a specific threshold. We specify this with the –T
parameter, while keeping the value of the threshold low to keep the attributes with at least some information:
search.setOptions(new String[] { "-T", "0.001" });
Next, we can initialize the AttributeSelection
class, set the evaluator and ranker, and apply the attribute selection to our dataset:
AttributeSelection attSelect = new AttributeSelection(); attSelect.setEvaluator(eval); attSelect.setSearch(search); // apply attribute selection attSelect.SelectAttributes(data);
Finally, we remove the attributes that were not selected in the last run by calling the reduceDimensionality(Instances)
method.
// remove the attributes not selected in the last run data = attSelect.reduceDimensionality(data);
At the end, we are left with 214 out of 230 attributes.
Over the years, practitioners in the field of machine learning have developed a wide variety of learning algorithms and improvements to the existing ones. There are so many unique supervised learning methods that it is challenging to keep track of all of them. As characteristics of the datasets vary, no one method is the best in all the cases, but different algorithms are able to take advantage of different characteristics and relationships of a given dataset. The property the Ensemble Selection algorithm is to try to leverage (Jung, 2005):
Intuitively, the goal of ensemble selection algorithm is to automatically detect and combine the strengths of these unique algorithms to create a sum that is greater than the parts. This is accomplished by creating a library that is intended to be as diverse as possible to capitalize on a large number of unique learning approaches. This paradigm of overproducing a huge number of models is very different from more traditional ensemble approaches. Thus far, our results have been very encouraging.
First, we need to create the model library by initializing the weka.classifiers.EnsembleLibrary
class, which will help us define the models:
EnsembleLibrary ensembleLib = new EnsembleLibrary();
Next, we add the models and their parameters as strings to the library as string values, for example, we can add three decision tree learners with different parameters, as follows:
ensembleLib.addModel("weka.classifiers.trees.J48 -S -C 0.25 -B -M 2"); ensembleLib.addModel("weka.classifiers.trees.J48 -S -C 0.25 -B -M 2 -A");
If you are familiar with the Weka graphical interface, you can also explore the algorithms and their configurations there and copy the configuration as shown in the following screenshot: right-click on the algorithm name and navigate to Edit configuration | Copy configuration string:
To complete the example, we added the following algorithms and their parameters:
ensembleLib.addModel("weka.classifiers.bayes.NaiveBayes");
ensembleLib.addModel("weka.classifiers.lazy.IBk");
ensembleLib.addModel("weka.classifiers.functions.SimpleLogistic");
ensembleLib.addModel("weka.classifiers.functions.SMO");
ensembleLib.addModel("weka.classifiers.meta.AdaBoostM1");
ensembleLib.addModel("weka.classifiers.meta.LogitBoost");
ensembleLib.addModel("classifiers.trees.DecisionStump");
As the EnsembleLibrary
implementation is primarily focused on GUI and console users, we have to save the models into a file by calling the saveLibrary(File, EnsembleLibrary, JComponent)
method, as follows:
EnsembleLibrary.saveLibrary(new File(path+"ensembleLib.model.xml"), ensembleLib, null); System.out.println(ensembleLib.getModels());
Next, we can initialize the Ensemble Selection algorithm by instantiating the weka.classifiers.meta.EnsembleSelection
class. Let's first review the following method options:
-L </path/to/modelLibrary>
: This specifies the modelLibrary
file, continuing the list of all models.-W </path/to/working/directory>
: This specifies the working directory, where all models will be stored. -B <numModelBags>
: This sets the number of bags, that is, the number of iterations to run the Ensemble Selection algorithm. -E <modelRatio>
: This sets the ratio of library models that will be randomly chosen to populate each bag of models. -V <validationRatio>
: This sets the ratio of the training data set that will be reserved for validation. -H <hillClimbIterations>
: This sets the number of hill climbing iterations to be performed on each model bag. -I <sortInitialization>
: This sets the ratio of the ensemble library that the sort initialization algorithm will be able to choose from, while initializing the ensemble for each model bag. -X <numFolds>
: This sets the number of cross validation folds. -P <hillclimbMettric>
: This specifies the metric that will be used for model selection during the hill climbing algorithm. Valid metrics are accuracy, rmse, roc, precision, recall, fscore, and all. -A <algorithm>
: This specifies the algorithm to be used for ensemble selection. Valid algorithms are forward (default) for forward selection, backward for backward elimination, both for both forward and backward elimination, best to simply print the top performer from the ensemble library, and library to only train the models in the ensemble library. -R
: This flags whether or not the models can be selected more than once for an ensemble. -G
: This states whether the sort initialization greedily stops adding models when the performance degrades. -O
: This is a flag for verbose output. This prints the performance of all the selected models. -S <num>
: This is a random number seed (default 1). -D
: If set, the classifier is run in the debug mode and may output additional information to the console.We initialize the algorithm with the following initial parameters, where we specified optimizing the ROC metric:
EnsembleSelection ensambleSel = new EnsembleSelection(); ensambleSel.setOptions(new String[]{ "-L", path+"ensembleLib.model.xml", // </path/to/modelLibrary>"-W", path+"esTmp", // </path/to/working/directory> - "-B", "10", // <numModelBags> "-E", "1.0", // <modelRatio>. "-V", "0.25", // <validationRatio> "-H", "100", // <hillClimbIterations> "-I", "1.0", // <sortInitialization> "-X", "2", // <numFolds> "-P", "roc", // <hillclimbMettric> "-A", "forward", // <algorithm> "-R", "true", // - Flag to be selected more than once "-G", "true", // - stops adding models when performance degrades "-O", "true", // - verbose output. "-S", "1", // <num> - Random number seed. "-D", "true" // - run in debug mode });
The evaluation is heavy both computationally and memory-wise, so make sure that you initialize the JVM with extra heap space—for instance, java –Xmx16g—while the computation can take a couple of hours or days, depending on the number of algorithms you include in the model library. This example took 4 hours and 22 minutes on 12-core Intel Xeon E5-2420 CPU with 32 GB of memory, utilizing 10% CPU and 6 GB of memory on average.
We call our evaluation method and output the results, as follows:
double resES[] = evaluate(ensambleSel); System.out.println("Ensemble Selection " + " churn: " + resES[0] + " " + " appetency: " + resES[1] + " " + " up-sell: " + resES[2] + " " + " overall: " + resES[3] + " ");
The specific set of classifiers in the model library achieved the following result:
Ensamble churn: 0.7109874158176481 appetency: 0.786325687118347 up-sell: 0.8521363243575182 overall: 0.7831498090978378
Overall, the approach has brought us to a significant improvement of more than 15 percentage points compared to the initial baseline that we designed at the beginning of the chapter. While it is hard to give a definite answer, the improvement was mainly due to three factors: data pre-processing and attribute selection, exploration of a large variety of learning methods, and use of an ensemble-building technique that is able to take advantage of the variety of base classifiers without overfitting. However, the improvement requires a significant increase in processing time, as well as working memory.