Let's now merge the theoretical content presented so far together into simple examples of learning algorithms. In this chapter, we are going to explore a couple of learning algorithms in single layer neural networks; multiple layers will be covered in the next chapter.
In the Java code, we will create one new superclass LearningAlgorithm
in a new package edu.packt.neural.learn
. Another useful package called edu.packt.neural.data
will be created to handle datasets that will be processed by the neural network, namely the classes NeuralInputData
, and NeuralOutputData
, both referenced by the NeuralDataSet
class. We recommend the reader takes a glance at the code documentation to understand how these classes are organized, to save text space here.
The LearningAlgorithm
class has the following attributes and methods:
public abstract class LearningAlgorithm { protected NeuralNet neuralNet; public enum LearningMode {ONLINE,BATCH}; protected enum LearningParadigm {SUPERVISED,UNSUPERVISED}; //… protected int MaxEpochs=100; protected int epoch=0; protected double MinOverallError=0.001; protected double LearningRate=0.1; protected NeuralDataSet trainingDataSet; protected NeuralDataSet testingDataSet; protected NeuralDataSet validatingDataSet; public boolean printTraining=false; public abstract void train() throws NeuralException; public abstract void forward() throws NeuralException; public abstract void forward(int i) throws NeuralException; public abstract Double calcNewWeight(int layer,int input,int neuron) throws NeuralException; public abstract Double calcNewWeight(int layer,int input,int neuron,double error) throws NeuralException; //… }
The neuralNet
object is a reference to the neural network that will be trained by this learning algorithm. The enums
define the learning mode and learning paradigm. The learning executing parameters are defined (MaxEpochs, MinOverallError, LearningRate), and the datasets that will be taken into account during the learning process.
The method train( )
should be overridden by each learning algorithm implementation. All the training process will occur in this method. The methods forward( )
and forward(int k)
process the neural network with all input data and with the kth input data record, respectively. And finally, the method calcNewWeight( )
will perform the weight update for the weight connecting an input to a neuron in a specific layer. A variation in the calcNewWeight( )
method allows providing a specific error to be taken in the update operation.
This algorithm updates the weights according to the cost function. Following the gradient approach, one wants to know which weights can drive the cost function to a lower value. Note that we can find the direction by computing the partial derivative of the cost function to each of the weights. To help in understanding, let's consider one simple approach with only one neuron, one weight, and one bias, and therefore one input. The output will be as follows:
Here, g is the activation function, X is the vector containing x values, and Y is the output vector generated by the neural network. The general error for the kth sample is quite simple:
However, it is possible to define this error as square error, N-degree error, or MSE. But, for simplicity, let's consider the simple error difference for the general error. Now the overall error, that will be the cost function, should be computed as follows:
The weight and bias are updated according to the delta rule, that considers the partial derivatives with respect to the weight and the bias, respectively. For the batch training mode, X and E are vectors:
If the training mode is online, we don't need to perform dot product:
Note in the preceding equations the presence of the term a that indicates the learning rate. It plays an important role in weight update, because it can drive faster or slower to the minimum cost value. Let's see a cost error surface in relation to two weights:
We will implement the delta rule in a class called DeltaRule
, that will extend the LearningAlgorithm
class:
public class DeltaRule extends LearningAlgorithm { public ArrayList<ArrayList<Double>> error; public ArrayList<Double> generalError; public ArrayList<Double> overallError; public double overallGeneralError; public double degreeGeneralError=2.0; public double degreeOverallError=0.0; public enum ErrorMeasurement {SimpleError, SquareError,NDegreeError,MSE} public ErrorMeasurement generalErrorMeasurement=ErrorMeasurement.SquareError; public ErrorMeasurement overallErrorMeasurement=ErrorMeasurement.MSE; private int currentRecord=0; private ArrayList<ArrayList<ArrayList<Double>>> newWeights; //… }
The errors discussed in the error measurement section (general and overall errors) are implemented in the DeltaRule
class, because the delta rule learning algorithm considers these errors during the training. They are arrays because there will be a general error for each dataset record, and there will be an overall error for each output. An attribute overallGeneralError
takes on the cost function result, or namely the overall error for all outputs and records. A matrix called error, stores the errors for each output record combination.
This class also allows multiple ways of calculating the overall and general errors. The attributes generalErrorMeasurement
and overallErrorMeasurement
can take on one of the input values for simple error, square error calculation, Nth degree error (cubic, quadruple, and so on), or the MSE. The default will be simple error for the general error and MSE for the overall.
Two important attributes are worth noting in this code: currentRecord
refers to the index of the record being fed into the neural network during training, and the newWeights
cubic matrix is a collection of all new values of weights that will be updated in the neural network. The currentRecord
attribute is useful in the online training, and the newWeights
matrix helps the neural network to keep all of its original weights until all new weights calculation is finished, preventing new weights to be updated during the forward processing stage, what could compromise the training quality significantly.
To save space, we will not detail here the implementation of the forward methods. As described in the previous section, forward means that neural dataset records should be fed into the neural network and then the error values are calculated:
@Override public void train() throws NeuralException{ //… switch(learningMode){ case BATCH: //this is the batch training mode epoch=0; forward(); //all data are presented to the neural network while(epoch<MaxEpochs && overallGeneralError>MinOverallError){ //continue condition epoch++; //new epoch for(int j=0;j<neuralNet.getNumberOfOutputs();j++){ for(int i=0;i<=neuralNet.getNumberOfInputs();i++){ //here the new weights are calculated newWeights.get(0).get(j).set(i,calcNewWeight(0,i,j)); } } //only after all weights are calculated, they are applied applyNewWeights(); // the errors are updated with the new weights forward(); } break; case ONLINE://this is the online training epoch=0; int k=0; currentRecord=0; //this attribute is used in weight update forward(k); //only the k-th record is presented while(epoch<MaxEpochs && overallGeneralError>MinOverallError){ for(int j=0;j<neuralNet.getNumberOfOutputs();j++){ for(int i=0;i<=neuralNet.getNumberOfInputs();i++){ newWeights.get(0).get(j).set(i,calcNewWeight(0,i,j)); } } //the new weights will be considered for the next record applyNewWeights(); currentRecord=++k; if(k>=trainingDataSet.numberOfRecords){ k=0; //if it was the last record, again the first currentRecord=0; epoch++; //epoch completes after presenting all records } forward(k); //presenting the next record } break; } }
We note that in the train( )
method, there is a loop with a condition to continue training. This means that while the training will stop when this condition no longer holds true. The condition checks the epoch
number and the overall error. When the epoch
number reaches the maximum or the error reaches the minimum, the training is finished. However, there are some cases in which the overall error fails to meet the minimum requirement, and the neural network needs to stop training.
The new weight is calculated using the calcNewWeight( )
method:
@Override public Double calcNewWeight(int layer,int input,int neuron) throws NeuralException{ //… Double deltaWeight=LearningRate; Neuron currNeuron=neuralNet.getOutputLayer().getNeuron(neuron); switch(learningMode){ case BATCH: //Batch mode ArrayList<Double> derivativeResult=currNeuron.derivativeBatch(trainingDataSet.getArrayInputData()); ArrayList<Double> _ithInput; if(input<currNeuron.getNumberOfInputs()){ // weights _ithInput=trainingDataSet.getIthInputArrayList(input); } else{ // bias _ithInput=new ArrayList<>(); for(int i=0;i<trainingDataSet.numberOfRecords;i++){ _ithInput.add(1.0); } } Double multDerivResultIthInput=0.0; // dot product for(int i=0;i<trainingDataSet.numberOfRecords;i++){ multDerivResultIthInput+=error.get(i).get(neuron)*derivativeResult.get(i)*_ithInput.get(i); } deltaWeight*=multDerivResultIthInput; break; case ONLINE: deltaWeight*=error.get(currentRecord).get(neuron); deltaWeight*=currNeuron.derivative(neuralNet.getInputs()); if(input<currNeuron.getNumberOfInputs()){ deltaWeight*=neuralNet.getInput(input); } break; } return currNeuron.getWeight(input)+deltaWeight; //… }
Note that in the weight update, there a call to the derivative of the activation function of the given neuron. This is needed to meet the delta rule. In the activation function interface, we've added this method derivative( )
to be overridden in each of the implementing classes.
In the train( )
method, we've seen that new weights are stored in the newWeights
attribute, to not influence the current learning process, and are only applied after the training iteration has finished.
In the 1940s, the neuropsychologist Donald Hebb postulated that the connections between neurons that activate or fire simultaneously, or using his words, repeatedly or persistently, should be increased. This is one approach of unsupervised learning, since no target output is specified for Hebbian learning:
In summary, the weight update rule for Hebbian learning takes into account only the input and outputs of the neuron. Given a neuron j whose connection to neuron i (weight ij) is to be updated, the update is given by the following equation:
Here, a is a learning rate, oj is the output of the neuron j, and oi is the output of the neuron i, also the input i for the neuron j. For the batch training case, oi and oj will be vectors, and we'll need to perform a dot product.
Since we don't include error measurement in Hebbian learning, a stop condition can be determined by the maximum number of epochs or the increase in the overall average of neural outputs. Given N records, we compute the expectancy or average of all outputs produced by the neural network. When this average increases over a certain level, it is time to stop the training, to prevent the neural outputs from blowing up.
We'll develop a new class for Hebbian learning, also inheriting from LearningAlgorithm
:
public class Hebbian extends LearningAlgorithm { //… private ArrayList<ArrayList<ArrayList<Double>>> newWeights; private ArrayList<Double> currentOutputMean; private ArrayList<Double> lastOutputMean; }
All parameters except for the absent error measures and the new measures of mean are identical to the DeltaRule
class. The methods are quite similar, except for the calcNewWeight( )
:
@Override public Double calcNewWeight(int layer,int input,int neuron) throws NeuralException{ //… Double deltaWeight=LearningRate; Neuron currNeuron=neuralNet.getOutputLayer().getNeuron(neuron); switch(learningMode){ case BATCH: //… //the batch case is analogous to the implementation in Delta Rule //but with the neuron's output instead of the error //we're suppressing here to save space break; case ONLINE: deltaWeight*=currNeuron.getOutput(); if(input<currNeuron.getNumberOfInputs()){ deltaWeight*=neuralNet.getInput(input); } break; } return currNeuron.getWeight(input)+deltaWeight; }
Adaline is an architecture standing for Adaptive Linear Neuron, developed by Bernard Widrow and Ted Hoff, based on the McCulloch, and Pitts neuron. It has only one layer of neurons and can be trained similarly to the delta rule. The main difference lies in the fact that the update rule is given by the error between the weighted sum of inputs and biases and the target output, instead of updating based on the neuron output after the activation function. This may be desirable when one wants to perform continuous learning for classification problems, which tend to use discrete values instead of continuous.
The following figure illustrates how Adaline learns:
So the weights are updated by the following equation:
In order to implement Adaline, we create a class called Adaline with the following overridden weight calcNewWeight
. To save space, we're presenting only the online case:
@Override public Double calcNewWeight(int layer,int input,int neuron) throws NeuralException{ //… Double deltaWeight=LearningRate; Neuron currNeuron=neuralNet.getOutputLayer().getNeuron(neuron); switch(learningMode){ case BATCH: //… break; case ONLINE: deltaWeight*=error.get(currentRecord).get(neuron) *currNeuron.getOutputBeforeActivation(); if(input<currNeuron.getNumberOfInputs()){ deltaWeight*=neuralNet.getInput(input); } break; } return currNeuron.getWeight(input)+deltaWeight; }
Note the method getOutputBeforeActivation( )
; we mentioned in the last chapter that this property would be useful in the future.