Logistic regression

We've covered that Neural Networks can work as data classifiers by establishing decision boundaries onto data in the hyperspace. This boundary can be linear, in the case of perceptrons, or nonlinear, in the case of other neural architectures such as MLPs, Kohonen, or Adaline. The linear case is based on linear regression, on which the classification boundary is a literally a line, as shown in the previous figure. If the scatter chart of the data looks like that of the following figure, then a nonlinear classification boundary is needed:

Logistic regression

Neural Networks are in fact a great nonlinear classifier, and this is achieved by the usage of nonlinear activation functions. One nonlinear function that actually works well for nonlinear classification is the sigmoid function, whereas the procedure for classification using this function is called logistic regression:

Logistic regression

This function returns values bounded between zero and one. In this function a parameter denotes how hard the transition from zero and 1 occurs. The following chart shows the difference:

Logistic regression

Note that the higher the alpha parameter is, the more the logistic function takes a shape of a hard-limiting threshold function, also known as a step function.

Multiple classes versus binary classes

Classification problems usually deal with a multiple class's case, where each class is assigned a label. However, a binary classification schema is useful to be applied in neural networks. This is because a neural network with a logistic function at the output layer can produce only values between 0 and 1, meaning it belongs (1) or does not belong (0) to some class.

Nevertheless, there is one approach for multiple classes using binary functions. Consider that every class is represented by an output neuron, and whenever that output neuron fires, that neuron's corresponding class is applied on the input data record. So let's suppose a network to classify diseases; each neuron output will represent a disease to be applied to some symptom:

Multiple classes versus binary classes

Tip

Note that in that configuration, it would be possible to have multiple diseases with the same symptoms, which can happen. However, if only one class would be desirable to be chosen, then a schema as the competitive learning algorithm would suit more in that case.

Confusion matrix

There is no perfect classifier algorithm; all of them are subject to errors and biases; however, it is expected that a classification algorithm can correctly classify 70-90% of the records.

Tip

Very high correct classification rates are not always desirable, because of possible biases presented in the input data that might affect the classification task, and also there is a risk of overtraining, when only the training data is correctly classified.

A confusion matrix shows how much of a given class's records were correctly classified and thereby how much were wrongly classified. The following table depicts what a confusion matrix may look like:

Confusion matrix

Note that the main diagonal is expected to have the higher values, as the classification algorithm will always try to extract meaningful information from the input dataset. The sum of all rows must be equal to 100%, because all elements of a given class are to be classified in one of the available classes. Note that some classes may receive more classifications than expected.

The more a confusion matrix looks like an identity matrix, the better the classification algorithm will be.

Sensitivity and specificity

When the classification is binary, the confusion matrix is found to be a simple 2x2 matrix, and therefore its positions are specially named:

Actual Class

Inferred Class

Positive (1)

Negative (0)

 

Positive (1)

True Positive

False Negative

Negative (0)

False Positive

True Negative

In disease diagnosis, which is the subject of this chapter, the concept of a binary confusion matrix is applied in the sense that a false diagnosis may be either false positive or false negative. The rate of false results can be measured by sensitivity and specificity indexes.

Sensitivity means the true positive rate; it measures how many of the records are correctly classified positively:

Sensitivity and specificity

Specificity, in turn, means the true negative rate; it indicates the proportion of negative record identification:

Sensitivity and specificity

High values of both sensitivity and specificity are desired; however, depending on the application field, the sensitivity may carry more meaning.

Implementing a confusion matrix

In our code, let's implement the confusion matrix in the class NeuralOutputData. The method calculateConfusionMatrix below is programmed to consider two neurons in the output layer. If the output is 10, then it is yes to a confusion matrix; if the output is 01, then it is no:

public double[][] calculateConfusionMatrix(double[][] dataOutputTestAdapted, double[][] dataOutputTargetTestAdapted) {
    int TP = 0;
    int TN = 0;
    int FP = 0;
    int FN = 0;
    for (int m = 0; m < getTargetData().length; m++) {
      if ( ( dataOutputTargetTestAdapted[m][0] == 1.0 && dataOutputTargetTestAdapted[m][1] == 0.0 )
          && ( dataOutputTestAdapted[m][0] == 1.0 && dataOutputTestAdapted[m][1] == 0.0 ) ) {
        TP++;
      } else if ( ( dataOutputTargetTestAdapted[m][0] == 0.0 && dataOutputTargetTestAdapted[m][1] == 1.0 )
          && (  dataOutputTestAdapted[m][0] == 0.0 && dataOutputTestAdapted[m][1] == 1.0 ) ) {
        TN++;            
      } else if ( ( dataOutputTargetTestAdapted[m][0] == 1.0 && dataOutputTargetTestAdapted[m][1] == 0.0 )
          && (  dataOutputTestAdapted[m][0] == 0.0 && dataOutputTestAdapted[m][1] == 1.0 ) ) {
        FP++;
      } else if ( ( dataOutputTargetTestAdapted[m][0] == 0.0 && dataOutputTargetTestAdapted[m][1] == 1.0 )
          && (  dataOutputTestAdapted[m][0] == 1.0 && dataOutputTestAdapted[m][1] == 0.0 ) ) {
        FN++;
      }
    }
    
    return new double[][] {{TP,FN},{FP,TN}};
    
  }

Another method implemented in the NeuralOutputData class is called calculatePerformanceMeasures. It receives as parameter the confusion matrix and it calculates and prints the following performance measures of classification:

  • Positive class error rate
  • Negative class error rate
  • Total error rate
  • Total accuracy
  • Precision
  • Sensibility
  • Specificity

This method is shown below:

public void calculatePerformanceMeasures(double[][] confMat) {
    double errorRatePositive = confMat[0][1] / (confMat[0][0]+confMat[0][1]);
    double errorRateNegative = confMat[1][0] / (confMat[1][0]+confMat[1][1]);
    double totalErrorRate = (confMat[0][1] + confMat[1][0]) / (confMat[0][0] + confMat[0][1] + confMat[1][0] + confMat[1][1]);
    double totalAccuracy  = (confMat[0][0] + confMat[1][1]) / (confMat[0][0] + confMat[0][1] + confMat[1][0] + confMat[1][1]);
    double precision = confMat[0][0] / (confMat[0][0]+confMat[1][0]);
    double sensibility = confMat[0][0] / (confMat[0][0]+confMat[0][1]);
    double specificity = confMat[1][1] / (confMat[1][0]+confMat[1][1]);
    
    System.out.println("### PERFORMANCE MEASURES ###");
    System.out.println("positive class error rate: "+(errorRatePositive*100.0)+"%");
    System.out.println("negative class error rate: "+(errorRateNegative*100.0)+"%");
    System.out.println("total error rate: "+(totalErrorRate*100.0)+"%");
    System.out.println("total accuracy: "+(totalAccuracy*100.0)+"%");
    System.out.println("precision: "+(precision*100.0)+"%");
    System.out.println("sensibility: "+(sensibility*100.0)+"%");
    System.out.println("specificity: "+(specificity*100.0)+"%");
    
  }
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset