We've covered that Neural Networks can work as data classifiers by establishing decision boundaries onto data in the hyperspace. This boundary can be linear, in the case of perceptrons, or nonlinear, in the case of other neural architectures such as MLPs, Kohonen, or Adaline. The linear case is based on linear regression, on which the classification boundary is a literally a line, as shown in the previous figure. If the scatter chart of the data looks like that of the following figure, then a nonlinear classification boundary is needed:
Neural Networks are in fact a great nonlinear classifier, and this is achieved by the usage of nonlinear activation functions. One nonlinear function that actually works well for nonlinear classification is the sigmoid function, whereas the procedure for classification using this function is called logistic regression:
This function returns values bounded between zero and one. In this function a parameter denotes how hard the transition from zero and 1 occurs. The following chart shows the difference:
Note that the higher the alpha parameter is, the more the logistic function takes a shape of a hard-limiting threshold function, also known as a step function.
Classification problems usually deal with a multiple class's case, where each class is assigned a label. However, a binary classification schema is useful to be applied in neural networks. This is because a neural network with a logistic function at the output layer can produce only values between 0
and 1
, meaning it belongs (1) or does not belong (0) to some class.
Nevertheless, there is one approach for multiple classes using binary functions. Consider that every class is represented by an output neuron, and whenever that output neuron fires, that neuron's corresponding class is applied on the input data record. So let's suppose a network to classify diseases; each neuron output will represent a disease to be applied to some symptom:
There is no perfect classifier algorithm; all of them are subject to errors and biases; however, it is expected that a classification algorithm can correctly classify 70-90% of the records.
A confusion matrix shows how much of a given class's records were correctly classified and thereby how much were wrongly classified. The following table depicts what a confusion matrix may look like:
Note that the main diagonal is expected to have the higher values, as the classification algorithm will always try to extract meaningful information from the input dataset. The sum of all rows must be equal to 100%, because all elements of a given class are to be classified in one of the available classes. Note that some classes may receive more classifications than expected.
The more a confusion matrix looks like an identity matrix, the better the classification algorithm will be.
When the classification is binary, the confusion matrix is found to be a simple 2x2 matrix, and therefore its positions are specially named:
Actual Class |
Inferred Class | |
---|---|---|
Positive (1) |
Negative (0) | |
Positive (1) |
True Positive |
False Negative |
Negative (0) |
False Positive |
True Negative |
In disease diagnosis, which is the subject of this chapter, the concept of a binary confusion matrix is applied in the sense that a false diagnosis may be either false positive or false negative. The rate of false results can be measured by sensitivity and specificity indexes.
Sensitivity means the true positive rate; it measures how many of the records are correctly classified positively:
Specificity, in turn, means the true negative rate; it indicates the proportion of negative record identification:
High values of both sensitivity and specificity are desired; however, depending on the application field, the sensitivity may carry more meaning.
In our code, let's implement the confusion matrix in the class NeuralOutputData
. The method calculateConfusionMatrix
below is programmed to consider two neurons in the output layer. If the output is 10, then it is yes to a confusion matrix; if the output is 01, then it is no:
public double[][] calculateConfusionMatrix(double[][] dataOutputTestAdapted, double[][] dataOutputTargetTestAdapted) { int TP = 0; int TN = 0; int FP = 0; int FN = 0; for (int m = 0; m < getTargetData().length; m++) { if ( ( dataOutputTargetTestAdapted[m][0] == 1.0 && dataOutputTargetTestAdapted[m][1] == 0.0 ) && ( dataOutputTestAdapted[m][0] == 1.0 && dataOutputTestAdapted[m][1] == 0.0 ) ) { TP++; } else if ( ( dataOutputTargetTestAdapted[m][0] == 0.0 && dataOutputTargetTestAdapted[m][1] == 1.0 ) && ( dataOutputTestAdapted[m][0] == 0.0 && dataOutputTestAdapted[m][1] == 1.0 ) ) { TN++; } else if ( ( dataOutputTargetTestAdapted[m][0] == 1.0 && dataOutputTargetTestAdapted[m][1] == 0.0 ) && ( dataOutputTestAdapted[m][0] == 0.0 && dataOutputTestAdapted[m][1] == 1.0 ) ) { FP++; } else if ( ( dataOutputTargetTestAdapted[m][0] == 0.0 && dataOutputTargetTestAdapted[m][1] == 1.0 ) && ( dataOutputTestAdapted[m][0] == 1.0 && dataOutputTestAdapted[m][1] == 0.0 ) ) { FN++; } } return new double[][] {{TP,FN},{FP,TN}}; }
Another method implemented in the NeuralOutputData
class is called calculatePerformanceMeasures
. It receives as parameter the confusion matrix and it calculates and prints the following performance measures of classification:
This method is shown below:
public void calculatePerformanceMeasures(double[][] confMat) { double errorRatePositive = confMat[0][1] / (confMat[0][0]+confMat[0][1]); double errorRateNegative = confMat[1][0] / (confMat[1][0]+confMat[1][1]); double totalErrorRate = (confMat[0][1] + confMat[1][0]) / (confMat[0][0] + confMat[0][1] + confMat[1][0] + confMat[1][1]); double totalAccuracy = (confMat[0][0] + confMat[1][1]) / (confMat[0][0] + confMat[0][1] + confMat[1][0] + confMat[1][1]); double precision = confMat[0][0] / (confMat[0][0]+confMat[1][0]); double sensibility = confMat[0][0] / (confMat[0][0]+confMat[0][1]); double specificity = confMat[1][1] / (confMat[1][0]+confMat[1][1]); System.out.println("### PERFORMANCE MEASURES ###"); System.out.println("positive class error rate: "+(errorRatePositive*100.0)+"%"); System.out.println("negative class error rate: "+(errorRateNegative*100.0)+"%"); System.out.println("total error rate: "+(totalErrorRate*100.0)+"%"); System.out.println("total accuracy: "+(totalAccuracy*100.0)+"%"); System.out.println("precision: "+(precision*100.0)+"%"); System.out.println("sensibility: "+(sensibility*100.0)+"%"); System.out.println("specificity: "+(specificity*100.0)+"%"); }