The multilayer perceptron (MLP)

The perceptron is a basic processing element that performs binary classification by mapping a scalar or vector to a binary (or XOR) value {true, false} or {-1, +1}. The original perceptron algorithm was defined as a single layer of neurons for which each value xi of the feature vector is processed in parallel and generates a single output y. The perceptron was later extended to encompass the concept of an activation function.

The single layer perceptrons are limited to process a single linear combination of weights and input values. Scientists found out that adding intermediate layers between the input and output layers enable them to solve more complex classification problems. These intermediate layers are known as hidden layers because they interface only with other perceptrons. Hidden nodes can be accessed only through the input layer.

From now on, we will use a three-layered perceptron to investigate and illustrate the properties of neural networks, as shown here:

The multilayer perceptron (MLP)

A three-layered perceptron

The three-layered perceptron requires two sets of weights: wij to process the output of the input layer to the hidden layer and vij between the hidden layer and the output layer. The intercept value w0, in both linear and logistic regression, is represented with +1 in the visualization of the neural network (w0.1 + w1.x1+w2.x2+ …).

Tip

FFNN with no hidden layer

A FFNN without a hidden layer is similar to a linear statistical model. The only transformation or connection between the input and output layer is actually a linear regression. A linear regression is a more efficient alternative to the FFNN without a hidden layer.

The description of the MLP components and their implementations rely on the following stages:

  1. Overview of the software design.
  2. Description of the MLP model components.
  3. Implementation of the four-step training cycle.
  4. Definition and implementation of the training strategy and the resulting classifier.

Tip

Terminology

Artificial neural networks encompass a large variety of learning algorithms, the multilayer perceptron being one of them. Perceptrons are indeed components of a neural network organized as input, output, and hidden layers. This chapter is dedicated to the multilayer perceptron with hidden layers. The terms "neural network" and "multilayer perceptron" are used interchangeably.

The activation function

The perceptron is represented as a linear combination of weights, wi, and input values, xi, processed by the output unit activation function h, as shown here:

The activation function

The output activation function h has to be continuous and differentiable for a range of value of the weights. It takes different forms depending on the problems to be solved, as mentioned here:

  • Identity for the output layer of the binary classification or regression problem
  • Sigmoid, The activation function, for hidden layers
  • Softmax for the multinomial classification
  • Hyperbolic tangent, tanh, for classification using zero mean

The Softmax formula is described in the next section.

The network architecture

The output layer and hidden layers have a computational capability (dot product of weights, inputs, and activation functions). The input layer does not transform data. An n-layer neural network is a network with n computational layers. Its architecture consists of the following components:

  • 1 input layer
  • (n-1) hidden layer
  • 1 output layer

A fully connected neural network has all its input nodes connected to hidden layer neurons. Networks are characterized as partially connected neural networks if one or more of their input variables are not processed. This chapter deals with a fully connected neural network.

Tip

Partially connected networks

Partially connected networks are not as complex as they seem. They can be generated from fully connected networks by setting some of the weights to zero.

The structure of the output layer is highly dependent on the type of problems (regression or classification) you need to solve, also known as the objective of the neural network. The type of problem at hand defines the number of output nodes [9:5], for example:

  • A one-variate regression has one output node whose value is a real number [0, 1]
  • A multivariate regression with n variables has n real output nodes
  • A binary classification has one binary output node {0, 1} or {-1, +1}
  • A multinomial or K-class classification has K binary output nodes

Software design

The implementation of the MLP classifier follows the same pattern as previous classifiers (refer to the Design template for classifiers section in Appendix A, Basic Concepts):

  • A model MLPModel of the type Model is initialized through training during the initialization of the classifier. The model is composed of a layer of neurons of the type MLPLayer, connected by synapses of the type MLPSynapse contained by a connector of the type MLPConnection.
  • All of the configuration parameters are encapsulated into a single configuration class, MLPConfig.
  • The predictive or classification routine is implemented as a data transformation, extending the PipeOperator trait.
  • The multilayer perceptron class, MLP, takes three parameters: configuration instance, a features set or time series of the XTSeries class, and a labeled dataset of the type DblMatrix.

The software components of the multilayer perceptron are described in the following UML class diagram:

Software design

A UML class diagram for the multilayer perceptron

The class diagram is a convenient navigation map to understand the role and relation of the Scala classes used to build an MLP. Let's start with the implementation of the MLP model and its components.

Model definition

The purpose of the model is to completely define the network architecture. It is implemented by the MLPModel parameterized class, which is responsible for creating and managing the different components of the network, layers, and connections as well as the topology.

Let's establish a simple naming convention for the layers of neurons as follows:

  • The input layer, inLayer, consists of nInputs neurons
  • A hidden layer, hidLayer, has nHiddens neurons
  • The output layer, outLayer, has nOutputs neurons

The instantiation of the class requires a minimum set of three parameters:

class MLPModel[T <% Double](config: MLPConfig, nInputs: Int, nOutputs: Int) extends Model {
   val layers: Array[MLPLayer]
   val connections: Array[MLPConnection]
   val topology: Array[Int]
}

Besides the config configuration, the model class has two parameters: the number of input features, {x}, nInputs; and the number of output values, {y}, nOutputs. These three parameters are all you need to initialize the topology of the network. A model has the following attributes:

  • Multiple layers of the type MLPLayers
  • Multiple connections of the type MLPConnection
  • A topology array that wires these layers and connections

The topology is defined as an array of number of nodes per layer, starting with the input nodes. The array indices follow the forward path within the network. The size of the input layer is automatically generated from the observations as the size of the features vector. The size of the output layer is automatically extracted from the size of the output vector:

val topology = Array[Int](nInputs) ++ config.hidLayers ++ 
                                          Array[Int](nOutputs)

The sequence of hidden layers, hidLayers, is defined as an array of number of neurons (or nodes) per hidden layers:

val hidLayers: Array[Int]

This is an attribute of the MLPConfiguration class described in the next section. For instance, the topology of a neural network with three input variables, one output variable, and two hidden layers of three neurons each is specified as Array[Int](4, 3, 3, 1).

The following diagram visualizes the interaction between the different components of a model: MLPLayer, MLPConnection, and MLPSynapse:

Model definition

Components of the MLP model

Layers

First, let's start with the definition of the layer class, MLPLayer, which is completely specified by its position in the network and the number of nodes it contains:

class MLPLayer(val id: Int, val len: Int) {
   val output = new DblVector(len) //1
   val delta = new DblVector(len) //2
...output.update(0, 1.0) //3

The id parameter is the order of the layer (0 for input, 1 for the first hidden layer,…, n-1 for the output layer) in the network. The len value is the number of elements or nodes, including the bias element, in this layer. The output vector for the layer (line 1) is an uninitialized vector of values updated during the forward propagation, except for the first value (bias element), which is set to 1 (line 3). The delta vector associated to the output vector (line 2) is updated through the error backpropagation algorithm, described in the next section.

The output values, except the bias element, is initialized using the set method:

def set(x: DblVector): Unit = x.copyToArray(output,1)

Synapses

A synapse is defined as a pair of real values:

  • The weight of the connection from the neuron i of the previous layer to the neuron j, wij
  • The weights adjustment (or gradient of weights), Synapseswij

Its type is defined as MLPSynapse, as shown here:

type MLPSynapse = (Double, Double)

Connections

A connection between two consecutive layers implements the matrix of synapses, the (wij, Connections wij) pairs. The MLPConnection instance is created with the following parameters:

  • Configuration parameters, config
  • The source layer, sometimes known as the ingress layer, src
  • The destination (or egress) layer, dst

The MLPConnection class is defined as follows:

class MLPConnection(config: MLPConfig, src: MLPLayer, dst: MLPLayer)

The last step in the initialization of the MLP algorithm is the selection of the initial (usually random) values of the weights (synapse). The following code snippet initializes the weights for non-bias neurons as random values in the range [0, beta] with beta <= 1.0.

The weight for the bias is obviously defined as w0=+1, and its weight adjustment is initialized as Connections w0 = 0, as shown here:

Val beta = 0.1
val synapses = Array.tabulate(dst.len)(n => 
   if(n > 0) Array.fill(src.len)((beta*Random.nextDouble, 0.0))
   else Array.fill(src.len)((1.0, 0.0))
)

Tip

Random initialization of weights

The range [0, beta] of initial random values is domain specific. Some problems require a very small range, less than 1e-3, while others use the probability space [0, 1]. The initial values impact the number of epochs required to converge toward an optimal set of weights. [9:6]

Once the topology, synapses, layers, and connections of the MLP algorithm are defined, the initialization of the MLPModel model is straightforward:

val layers = topology.zipWithIndex
                     .map(t => MLPLayer(t._2, t._1+1))
val connections = Range(0, layers.size-1).map(n =>
    new MLPConnection(config, layers(n), layers(n+1))).toArray

The layers are created by traversing the network topology and instantiating each layer with its proper index and number of elements. The connections are instantiated by selecting two consecutive layers of index n (with respect to n+1) as source (with respect to destination).

Tip

Encapsulation and the model factory

The model components: connections, layers, and synapses are implemented as top-level classes for clarity sake. However, there is no need for the model to expose its inner workings to the client code. These components should be declared as an inner class to the model.

Moreover, the model is responsible for creating its topology. A factory design pattern would be perfectly appropriate to instantiate an MLPModel instance dynamically [9:7].

Once initialized, the MLP model is ready to be trained using a combination of forward propagation, output error back propagation, and iterative adjustment of weights and gradients of weights.

Training cycle/epoch

The training of the model processes the training observations multiple times. A training cycle or iteration is known as an epoch. The five steps of the training cycle are as follows:

  1. Forward propagation of the input value for a specific epoch.
  2. Compute the sum of squared errors.
  3. Backpropagation of the output error.
  4. Recomputation of the synapse weight and gradient of weight.
  5. Evaluate the convergence criteria and exit if criteria is met

The computation of the network weights during training could use the difference between labeled data and actual output for each layer. But this solution is not feasible, because the output of the hidden layers is unknown. The solution is to propagate the error on the output values backward through the hidden layers. This approach is not that different than the beta (or backward) pass in the hidden Markov model, covered in the Beta class (the backward variable) section in Chapter 7, Sequential Data Models.

The error at the output layer for p neurons can be computed in either of the following ways:

  • Sum of the squared of errors (SSE): Calculated for each output, yk
  • Mean squared error (MSE): Calculated as MSE= SSE/p

We select the sum of the squared errors to initialize the error back-propagation algorithm.

Step 1 – input forward propagation

As mentioned earlier, the output values of a hidden layer are computed as a logistic function (the activation function) of the dot product of the weights wij and the input values xi.

In the following diagram, the MLP algorithm computes the linear product of the weights wij and input xi for the hidden layer. The product is then processed by the activation function Step 1 – input forward propagation (sigmoid or hyperbolic tangent). The output values zj are then combined with the weights vij of the output layer. The output layer doesn't have an activation function.

Step 1 – input forward propagation

The mathematical formulation of the output of a neuron j is defined as a composition of the activation function and the dot product of the weights wij and input values xi.

Note

Computation of the output y for the output layer:

Step 1 – input forward propagation

Estimation of the output values for binary classification with an activation function Step 1 – input forward propagation:

Step 1 – input forward propagation

As seen in the network architecture section, the output values for the multinomial (or multiclass) classification with more than two classes are normalized using an exponential function (softmax).

The computational model

The computation of the output values y from the input x is known as the input forward propagation. For the sake of simplicity, we represent the forward propagation between layers with the following block diagram. Such a representation will be quite convenient for the design and implementation of the MLP.

The computational model

A computation model of input forward propagation

This diagram illustrates a computational model for the input forward propagation, as the programmatic relation between the source and destination layers and their connectivity. The input x is propagated forward through each connection.

The connectionForwardPropagation method computes the dot product of the weights and the input values, and applies the activation function in the case of hidden layers, for each connection. Therefore, it is a member of the MLPConnection class.

The forward propagation of input values across the entire network is managed by the MLP algorithm itself.

The forward propagation of the input value is used in the classification or prediction y =f(x). It depends on the value weights wij and vij that need to be estimated through training. As you may have guessed, the weights define the model of a neural network similar to the regression models. Let's look at the connectionForwardPropagation method of the MLPConnection class:

def connectionForwardPropagation: Unit = {
  val synps= synapses.drop(1)
  val _output = synps.map(x => { //1
      val sum = x.zip(src.output)
                  .foldLeft(0.0)((s, xy) => s + xy._1._1*xy._2)
      if(!isOutLayer) config.activation(sum) //2
      else sum
  })
  val out = if(isOutLayer) mlpObjective(_output) else _output //3
  out.copyToArray(dst.output, 1)     
}

The first step is to compute the linear dot product of the _output output of the current source layer, src, for this connection, and the weights, w (line 1). The activation method, the implementation of which is described in the next paragraph, is applied to the dot product, dot (line 2). If the destination layer of the connection is the output layer, then the output values are processed according to the mlpObjective objective of the algorithm (line 3).

Objective

In the The network architecture section, you learned that the structure of the output layer depends on the type of problems that need to be resolved, or objective of the algorithm. Let's encapsulate the different objectives (binary, multiclass classifiers, and regression) into an MLPObjective hierarchy (nested in MLP companion object) and the transformation of the output values, y, using a simple apply method:

trait MLPObjective { def apply(y: DblVector): DblVector }

The output of the apply method is used to compute the sum of squared errors during training, after the forward propagation of features. The binary (2 class) classifier requires a single output without any transformation because the values are either 0 or 1.

class MLPBinClassifier extends MLPObjective {
  override def apply(y: DblVector): DblVector = output
}

The MLPMultiClassifier multiclass classifier objective class used the softmax method to boost the output with the highest value, as shown here:

class MLPMultiClassifier extends MLPObjective {
   override def apply(y:DblVector):DblVector = softmax(y.drop(1))
   def softmax(y: DblVector): DblVector = { …}
}

The softmax method is applied to the actual output value, not the bias. Therefore, the first node y(0)=+1 has to be dropped before applying the softmax normalization.

Softmax

In case of a classification problem with K classes (K > 2), the output has to be converted into a probability [0, 1]. For problems that require a large number of classes, there is a need to boost the output yk with the highest value (or probability). This process is known as the exponential normalization or softmax [9:8].

Note

Softmax formula for multinomial (K > 2) classification is as follows:

Softmax

Here is the simple implementation of the softmax method of the MLPMultiClassifier class:

def softmax(y: DblVector): DblVector = {
   val softmaxValues = new DblVector(y.size)
   val expY = y.map( Math.exp(_))//1
   val expYSum = expY.sum
   expY.map( _ /expYSum).copyToArray(softmaxValues, 1) //2
   softmaxValues
}

First, the output values are transformed to exponential, expY (line 1). The exponentially transformed outputs are then normalized by their sum, expYSum, to generate the array of softmaxValues output (line 2). Once again, you do not have to update the bias element y(0).

The second step in the training phase is the back propagation of the output error.

Step 2 – sum of squared errors

Once the input features are propagated across the neural network, the sum of squared errors, sse, for the output layer of the MPLayer type is computed at each epoch, as follows:

def sse(labels: DblVector): Double = {
   var _sse = 0.0
   output.drop(1)  //1
        .zipWithIndex
        .foreach(on => {
     val err = labels(on._2) - on._1  //2
     delta.update(on._2+1, on._1* (1.0- on._1)*err) //3
     _sse += err*err
  })
  _sse*0.5  //4
}

As expected, the computation of the sum of squared errors requires the labeled values, labels, and the objective method as arguments. The vector output values, output, stripped of the bias node (line 1) is used to compute the difference, err, between the label and the actual output (line 2). The delta value (line 3), described in the next section, is used in the back-propagation algorithm to adjust the weights of the output and hidden layers. Note that the sum of squares, _sse, is divided by 2 (line 4), so its derivative is err.

Step 3 – error backpropagation

The error backpropagation is an algorithm that estimates the error for the hidden layer in order to compute the change in weights of the network. It takes the sum of squared errors on the output as input.

Note

Terminology

Some authors refer to the backpropagation as a training methodology for an MLP, which applies the gradient descent to the output error defined as either the sum of squared errors, or the mean squared error. In this chapter, we keep the narrower definition of backpropagation as the backward computation of the sum of squared errors.

Error propagation

The objective of the training of a perceptron is to minimize the sum of squared errors at the output layer. The error Error propagation for each output neuron, yk, is computed as the difference between a predicted output value and label output value. This approach does not work for the hidden layers zj because the label value is unknown.

Error propagation

The partial derivative of the sum of squared output error over each weight of the output layer is computed as the composition of the derivative of the square function, and the derivative of the dot product of weights and the input z.

Note

Derivative of the output SSE over the weighs of the output layer:

Error propagation

As mentioned earlier, the computation of the partial derivative of the sum of squared error over the weights of the hidden layer is a bit tricky. Fortunately, the partial derivative can be broken down into the following three pieces using the output layer values and the output of the hidden layer:

  • Derivative of sum of squared error Error propagation over the output value yk
  • Derivative of the output value yk over the hidden value zj knowing that the derivative of a sigmoid Error propagation is Error propagation(1 - Error propagation)
  • Derivative of the output of the hidden layer zj over the weights wij

Note

Derivative of error over the weights of the hidden layer:

Error propagation

The computational model

The computational model for the error backpropagation algorithm is very similar to the forward propagation of the input. The main difference is that the propagation of the derivative delta The computational model is performed from the output layer to the input layer. The following diagram illustrates the computational model of the backpropagation in the case of two hidden layers zs and zt:

The computational model

The connectionBackPropagation method propagates the error back to the previous layer. It is a member of the MLPConnection class. The backpropagation of the output error across the entire network is managed by the MLP class.

It implements the two set of equations where synapses (j)(i)._1 are the weights wji, dst.delta is the vector of error derivative in the destination or next layer, and src.delta is the error derivative on the outputs in the source (or antecedent) layer, as shown here:

def connectionBackpropagation: Unit =  
  Range(1, src.len).foreach(i => {
    val dot = Range(1, dst.len).foldLeft(0.0)((s, j) => 
                         s + synapses(j)(i)._1*dst.delta(j)) //1
    src.delta(i) = src.output(i)*(1.0 - src.output(i))*dot//2
})

The dot product of the synapse weights and the errors of the destination layers (line 1) is used to compute the delta on the source (or previous layer) layers (line 2).

Step 4 – synapse/weights adjustment

The connection weights Step 4 – synapse/weights adjustment v and Step 4 – synapse/weights adjustment w are adjusted by computing the sum of the derivative of the error, over the weights scaled with a learning factor. The gradient of weights are then used to compute the error of the output of the source layer [9:9].

Momentum factor for gradient descent

The simplest algorithm to update the weights is the gradient descent [9:10].

The gradient descent is a very simple and robust algorithm. However, it is slower in converging toward a global minimum than the conjugate gradient or the quasi-Newton method (refer to the Summary of optimization techniques section in Appendix A, Basic Concepts).

There are several methods available to speed up the convergence of the gradient descent toward a minimum: momentum factor and adaptive learning coefficient [9:11].

Large variations of the weights (or large value of the gradient of weights) cause the gradient descent to require more training iteration in order to converge. This is particularly true for a training strategy known as online training. The training strategies are discussed in the next section. The momentum factor Momentum factor for gradient descent is used for the remaining section of the chapter.

Note

The computation of neural network weights using gradient descent is as follows:

Momentum factor for gradient descent

The computation of neural network weights using gradient descent method with momentum coefficient Momentum factor for gradient descent is as follows:

Momentum factor for gradient descent

The basic gradient descent algorithm is selected by setting the momentum factor Momentum factor for gradient descent to zero.

Implementation

The fourth step of the training phase is to adjust each connection's synapses (w, Implementation w). This task is performed by the connectionUpdate method of the MLPConnection class:

def connectionUpdate: Unit =  
  Range(1, dst.len).foreach(i => {  
    val delta = dst.delta(i) //1

    Range(0, src.len).foreach(j => { 
       val _output = src.output(j) //2
       val oldSynapse = synapses(i)(j)
       val grad = config.eta*delta*_output //3
       val deltaWeight = grad + config.alpha* oldSynapse._2 //4
       synapses(i)(j) = (oldSynapse._1 + deltaWeight, grad) //5
    })
  })

The connectionUpdate method computes the error of each destination neuron (line 1). The _output output of each neuron source (line 2) is used in the computation of the grad gradient (line 3). The weight is then adjusted for a momentum (line 4) as per the mathematical formulation. Finally, the synapses for source and destination layers are updated (line 5).

Tip

The adjustable learning rate

The computation of the new weights of a connection for each new epoch can be further improved by making the learning adjustable.

Step 5 – convergence criteria

The convergence criterion consists of evaluating the sum of squared errors against a predetermined threshold eps. It is common to normalize the sum of squared errors by the number of observations.

Configuration

The MLPConfig configuration of the multilayer perceptron consists of the definition of the network configuration with hidden layers, the learning parameters, the training parameters, and the activation function:

Class MLPConfig(val alpha: Double, val eta: Double, val hidLayers: Array[Int], val numEpochs: Int,val eps: Double,val activation: Double=>Double) extends Config

For the sake of readability, the name of the configuration parameters matches the symbols defined in the mathematical formulation:

  • alpha: This is the momentum factor.
  • eta: This is the learning rate (fixed or adaptive).
  • hidLayers: This is an array of size of hidden layers (for example, two hidden layers of two and four elements are specified as Array[Int](2,4)).
  • numEpochs: This is the maximum number of epochs allowed for training the neural network.
  • eps: This is the convergence criteria used as an exit condition for the training of the neural network, error < eps.
  • activation: This is the activation function used for nonlinear regression applied to hidden layers. The default function is the sigmoid.

Putting all together

The five steps of the training cycle have been implemented for each connection or matrix of synapses (weights, gradient of weights). The management of the cycle is performed by the algorithm defined by the MLP class, as shown here:

class MLP[T <% Double](config: MLPConfig, xt: XTSeries[Array[T]], labels: DblMatrix)(implicit val mlpObjective: MLP.MLPObjective) 
extends PipeOperator[Array[T], DblVector] {
   val model: Option[MLPModel]
   def |> : PartialFunction[Array[T], DblVector]
}

The MLP algorithm takes the following parameters:

  • config: The configuration of the algorithm
  • xt: The time series of features used to train the model
  • labels: The labeled output values for training purpose
  • mlpObjective: The implicit objective of the algorithm (a type of problem)

The five steps of the training cycle or epoch is summarized in the following diagram:

Putting all together

Let's apply the five steps of a training epoch in a trainEpoch method of the MLPModel class using a simple the foreach Scala iterator, as shown here:

def trainEpoch(x: DblVector, y: DblVector): Double = {
   inLayer.set(x)

   connections.foreach( _.connectionForwardPropagation) //1
   val _sse = sse(y) //2
   val bckIterator = connections.reverseIterator 
   bckIterator.foreach( _.connectionBackpropagation) //3
   connections.foreach( _.connectionUpdate) //4
  _sse
}

You can certainly recognize the first four stages of the training cycle: forward propagation of the input, x (line 1), computation of the sum of squared errors, _sse (line 2), the back propagation of the error (line 3), and the recomputation of the weight and gradient of weight associated with each synapse (line 4).

Training strategies and classification

Once the training cycle or epoch is defined, it is merely a matter of defining and implementing a strategy to create a model using a sequence of data or time series.

Online versus batch training

One important remaining issue is finding a strategy to conduct the training of time series, as ordered sequences of data. There are two strategies to create an MLP model for time series:

  • Batch training: The entire time series is processed at once as a single input to the neural network. The weights (synapses) are updated at each epoch using the sum of squared errors on the output of the time series. The training exits once the sum of the squared errors meets the convergence criteria.
  • Online training: The observations are fed to the neural network one at a time. Once the time series has been processed, the total of the sum of the squared error (sse) for the time series for all the observations are computed. If the exit condition is not met, the observations are reprocessed by the network.
    Online versus batch training

    An illustration on online and batch training

An online training is faster than batch training because the convergence criterion has to be met for each data point, possibly resulting in a smaller number of epochs [9:12]. Techniques such as the momentum factor, which is described earlier, or any adaptive learning scheme improve the performance of the online training process further.

The online training strategy is applied to a financial time series for the remainder of this chapter.

Regularization

There are two approaches to find the most appropriate network architecture for a given classification or regression problem; they are:

  • Destructive tuning: Starting with a large network, then removing nodes, synapses, and hidden layers that have no impact on the sum of squared errors
  • Constructive tuning: Starting with a small network, then incrementally adding the nodes, synapses, and hidden layers that reduce the output error

The destructive tuning strategy removes the synapses by zeroing out their weights. This is commonly accomplished by using regularization.

You have seen that regularization is a powerful technique to address overfitting in the case of the linear and logistic regression in the The ridge regression section in Chapter 6, Regression and Regularization. Neural networks can benefit from adding a regularization term to the sum of squared errors. The larger the regularization factor is, the more likely some weights will be reduced to zero, thus reducing the scale of the network [9:13].

Model instantiation

The model instance is created (trained) during the instantiation of the multilayer perceptron. The model is created by iterating the training cycle over all the data points of the time series xt, and through multiple epochs until the total sum of squared errors is smaller than the threshold eps, as in the following code:

var converged = false
val model: Option[MLPModel] = {
   val _model = new MLPModel(config, xt(0).size, labels(0).size)(mlpObjective)  //1
   val errScale = 1.0/(labels(0).size*xt.size)  //4

   converged = Range(0, config.numEpochs).find( _ => {
     xt.toArray.zip(labels)
               .foldLeft(0.0)((s, xtlbl) => 
                   s + _model.trainEpoch(xtlbl._1, xtlbl._2) //2
                )*errScale < config.eps  //3
   }) != None
   _model
}

The model is first initialized (line 1). The first four stages of the MLP training cycle are executed by the MLPModel.trainEpoch method described in the previous section (line 2). The method returns the sum of squared errors for each observation in the time series. The sum of squared errors for the observations are summed, then evaluated against the convergence criterion, eps (line 3). The sum of squared errors is normalized for the size of the time series and the size of the output vector (line 5). The implementation uses the Scala method, find, to exit from the iterative loop before the maximum number of epochs, config.numEpochs, is reached.

Tip

The exit condition

In this implementation a flag, converged, is set to indicate that the execution of the training has not converged before the maximum number of epochs has been reached; however, the model is still instantiated nevertheless. It allows the client code to evaluate the pattern of the sum of squared errors in regard to a local minimum.

Once the model is created during the instantiation of the multilayer perceptron, it is available to predict the class of a new observation.

Prediction

The prediction method of the MLPModel class, getOutput, takes a new observation (feature vector) as argument and returns the output by using the forward propagation algorithm:

def getOutput(x: DblVector): DblVector = {
  inLayer.set(x)
  connections.foreach( _.connectionForwardPropagation)
  outLayer.output
}

The classification method is implemented as the data transformation |>. It returns the predicted value, normalized as a probability if the model was successfully trained; None, otherwise:

def |> : PartialFunction[Array[T], DblVector] = {
  case x: Array[T] if(model!=None && x.size == dimension(xt)) =>
    model.get.getOutput(x))
}

Our MLP class is now ready to tackle some classification challenges.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset