In the preceding steps, we have seen some classes or methods that we should describe here, too. The first method, named toCategorical(), converts the Frame column from String/Int to enum; this is used to convert dayTime bags (that is, gr1, gr2, gr3, gr4) to a factor-like type. This function is also used to convert the Class column to a factor type in order to perform classification:
def toCategorical(f: Frame, i: Int): Unit = {
f.replace(i, f.vec(i).toCategoricalVec)
f.update()
}
This builds a confusion matrix for anomaly detection according to a threshold if an instance is considered anomalous (if its MSE exceeds the given threshold):
def confusionMat(mSEs:water.fvec.Frame,actualFrame:water.fvec.Frame,thresh: Double):Array[Array[Int]] = {
val actualColumn = actualFrame.vec("Class");
val l2_test = mSEs.anyVec();
val result = Array.ofDim[Int](2, 2)
var i = 0
var ii, jj = 0
for (i <- 0 until l2_test.length().toInt) {
ii = if (l2_test.at(i) > thresh) 1 else 0;
jj = actualColumn.at(i).toInt
result(ii)(jj) = result(ii)(jj) + 1
}
result
}
Apart from these two auxiliary methods, I have defined three Scala case classes for computing precision, recall; sensitivity, specificity; true positive, true negative, false positive and false negative and so on. The signature is as follows:
caseclass r(precision: Double, recall: Double)
caseclass r2(sensitivity: Double, specificity: Double)
caseclass r3(tp: Double, fp: Double, tn: Double, fn: Double, th: Double)