The code

Here is the code for our application. As always, we first load and parse our data into observations and target sample sets. Since this is a regression sample, we’ll use a 70/30 split of our data sample: 70% for training, 30% for testing. From there, we create our random forest learner and create our model. After this, we calculate our training and test errors and print out the feature importance in order of importance, as found by our model:

var parser = new CsvParser(() =>new StreamReader(Application.StartupPath + "\winequality-white.csv"));
var targetName = "quality";
// read in our feature matrix
var observations = parser.EnumerateRows(c => c != targetName).ToF64Matrix();
// read in our regression targets
var targets = parser.EnumerateRows(targetName).ToF64Vector();
// Since this is a regression problem, we use the random training/test set splitter. 30 % of the data is used for the test set. 
var splitter = new RandomTrainingTestIndexSplitter<double>(trainingPercentage: 0.7, seed: 24);
var trainingTestSplit = splitter.SplitSet(observations, targets);
var trainSet = trainingTestSplit.TrainingSet;
var testSet = trainingTestSplit.TestSet;
var learner = new RegressionRandomForestLearner(trees: 100);
var model = learner.Learn(trainSet.Observations, trainSet.Targets);
var trainPredictions = model.Predict(trainSet.Observations);
var testPredictions = model.Predict(testSet.Observations);
// since this is a regression problem we are using squared error as the metric
// for evaluating how well the model performs.
var metric = new MeanSquaredErrorRegressionMetric();
var trainError = metric.Error(trainSet.Targets, trainPredictions);
var testError = metric.Error(testSet.Targets, testPredictions);
Trace.WriteLine($"Train error: {trainError:0.0000} - Test error: {testError:0.0000}");
System.Console.WriteLine($"Train error: {trainError:0.0000} - Test error: {testError:0.0000}");

// the variable importance requires the featureNameToIndex from the dataset. 
// This mapping describes the relation from the column name to the associated 
// index in the feature matrix.
var featureNameToIndex = parser.EnumerateRows(c => c != targetName).First().ColumnNameToIndex;

var importances = model.GetVariableImportance(featureNameToIndex);

var importanceCsv = new StringBuilder();
importanceCsv.Append("FeatureName;Importance");
System.Console.WriteLine("FeatureName	Importance");
foreach (var feature in importances)
{
importanceCsv.AppendLine();
importanceCsv.Append($"{feature.Key};{feature.Value:0.00}");
System.Console.WriteLine($"{feature.Key}	{feature.Value:0.00}");
}
Trace.WriteLine(importanceCsv);

Table of Contents for The code

Create new playlist

Sign In

Sign Up

Table of Contents for
The code