The most exciting task in data mining is classifier training. This recipe will show you how to train a classifier and set the options.
To train a classifier, use the following snippet:
import weka.classifiers.trees.J48; String[] options = new String[1]; options[0] = "-U"; J48 tree = new J48(); tree.setOptions(options); tree.buildClassifier(data);
The classifier is now trained and ready for classification.
The classifier implements the OptionHandler
interface, which allows you to set the options via a String
array. First, import a classifier from the weka.classifiers
package, for example, a J48
decision tree:
import weka.classifiers.trees.J48;
Then, prepare the options in a String array, for example, set the tree to be unpruned:
String[] options = new String[1]; options[0] = "-U"; // un-pruned tree
Now, initialize the classifier:
J48 tree = new J48(); // new instance of tree
Set the options with the OptionHandler
interface:
tree.setOptions(options); // set the options
And build the classifier:
tree.buildClassifier(data); // build classifier
Now, you are ready to validate it and use it (see recipes Test and Evaluate).
There is a wide variety of implemented classifiers in Weka. This section first demonstrates how to build a support vector machine classifier, and then it lists some other popular classifiers, and finally it shows how to create a classifier that is able to incrementally accept data.
Another popular classifier is support vector machine. To train one, follow the previous recipe but instead of J48
, import the SMO
class from weka.classifiers.functions
:
import weka.classifiers.functions.SMO;
Then, initialize a new object and build the classifier:
SMO svm = new SMO(); svm.buildClassifier(data);
In addition to decision trees (weka.classifiers.trees.J48
) and support vector machines (weka.classifiers.functions.SMO
), we have listed some of the many other classification algorithms one can use in Weka:
weka.classifiers.rules.ZeroR
: Predicts the majority class (mode) and it is considered as a baseline; that is, if your classifier's performance is worse than the average value predictor, it is not worth considering it.weka.classifiers.trees.RandomTree
: Constructs a tree that considers K randomly chosen attributes at each node.weka.classifiers.trees.RandomForest
: Constructs a set (that is, forest) of random trees and uses majority voting to classify a new instance.weka.classifiers.lazy.IBk
: K-nearest neighbors classifier that is able to select appropriate value of neighbors based on cross-validation.weka.classifiers.functions.MultilayerPerceptron
: A classifier based on neural networks that uses back-propagation to classify instances. The network can be built by hand, or created by an algorithm, or both.weka.classifiers.bayes.NaiveBayes
: A naive Bayes classifier that uses estimator classes, where numeric estimator precision values are chosen based on analysis of the training data.weka.classifiers.meta.AdaBoostM1
: The class for boosting a nominal class classifier using the Adaboost M1 method. Only nominal class problems can be tackled. Often dramatically improves performance, but sometimes overfits.weka.classifiers.meta.Bagging
: The class for bagging a classifier to reduce variance. Can do classification and regression depending on the base learner.When a dataset is really big or you have to deal with real-time, stream data, then the preceding methods won't fit into memory all at once. Some classifiers implement the weka.classifiers.UpdateableClassifier
interface, which means they can be trained incrementally. These are AODE
, IB1
, IBk
, KStar
, LWL
, NaiveBayesUpdateable
, NNge
, RacedIncrementalLogitBoost
, and Winnow
.
The process of training an incremental classifier with the UpdatableClassifier
interface is fairly simple.
Open a dataset with the ArffLoader
class:
ArffLoader loader = new ArffLoader(); loader.setFile(new File("/some/where/data.arff"));
Load the structure of the dataset (does not contain any actual data rows):
Instances data = loader.getStructure(); data.setClassIndex(structure.numAttributes() - 1);
Initialize a classifier and call buildClassifier
with the structure of the dataset:
NaiveBayesUpdateable nb = new NaiveBayesUpdateable(); nb.buildClassifier(structure);
Subsequently, call the updateClassifier
method to feed the classifier new weka.core.Instance
objects, one by one:
Instance current; while ((current = loader.getNextInstance(structure)) != null) nb.updateClassifier(current);
After each update, the classifier takes into account the newly added instance to update its model.