The backpropagation learning algorithm is used to train a multilayer perceptron ANN from a given set of sample values. In brief, this algorithm first calculates the output value for a set of given input values and also calculates the amount of error in the output of the ANN. The amount of error in the ANN is determined by comparing the predicted output value of the ANN to the expected output value for the given input values from the training data provided to the ANN. The calculated error is then used to modify the weights of the ANN. Thus, after training the ANN with a reasonable number of samples, the ANN will be able to predict the output value for a set of input values. The algorithm comprises of three distinct phases. They are as follows:
The weights of the synapses in the ANN are first initialized to random values within the ranges and . We initialize the weights to values within this range to avoid a symmetry in the weight matrices. This avoidance of symmetry is called symmetry breaking, and it is performed so that each iteration of the backpropagation algorithm produces a noticeable change in the weights of the synapses in the ANN. This is desirable in an ANN as each of its node should learn independently of other nodes in the ANN. If all the nodes were to have identical weights, the estimated learning model will be either overfit or underfit.
Also, the backpropagation learning algorithm requires two additional parameters, which are the learning rate and the learning momentum . We will see the effects of these parameters in the example later in this section.
The forward propagation phase of the algorithm simply calculates the activation values of all nodes in the various layers of the ANN. As we mentioned earlier, the activation values of the nodes in the input layer are the input values and the bias input of the ANN. This can be formally defined by using the following equation:
Using these activation values from the input layer of the ANN, the activation of the nodes in the other layers of the ANN is determined. This is done by applying the activation function to the products of the weight matrix of a given layer and the activation values from the previous layer in the ANN. This can be formally expressed as follows:
The preceding equation explains that the activation value of a layer l is equal to the activation function applied to the output (or activation) values of the previous layer and the given layer's weight matrix. Next, the activation values of the output layer are backpropagated. By this, we mean that that the activation values are traversed from the output layer through the hidden layers to the input layer of the ANN. During this phase, we determine the amount of error or delta in each node in the ANN. The delta values of the output layer are determined by calculating the difference between the expected output values, , and the activation values of the output layer, . This difference calculation can be summarized by the following equation:
The term of a layer l is a matrix of size where j is the number of nodes in layer l. This term can be formally defined as follows:
The delta terms of the layers other than the output layer of the ANN are determined by the following equality:
In the preceding equation, the binary operation is used to represent an element-wise multiplication of two matrices of equal size. Note that this operation is different from matrix multiplication, and an element-wise multiplication will return a matrix composed of the products of the elements with the same position in two matrices of equal size. The term represents the derivative of the activation function used in the ANN. As we are using the sigmoid function as our activation function, the term has the value .
Thus, we can calculate the delta values of all nodes in the ANN. We can use these delta values to determine the gradients of the synapses of the ANN. We now move on to the final weight update phase of the backpropagation algorithm.
The gradients of the various synapses are first initialized to matrices with all the elements as 0. The size of a gradient matrix of a given synapse is the same size as the weight matrix of the synapse. The gradient term represents the gradients of the synapse layer that is present immediately after layer l in the ANN. The initialization of the gradients of the synapses in the ANN is formally expressed as follows:
For each sample value in the training data, we calculate the deltas and activation values of all nodes in the ANN. These values are added to the gradients of the synapses using the following equation:
We then calculate the average of the gradients for all the sample values and use the delta and gradient values of a given layer to update the weight matrix as follows:
Thus, the learning rate and learning momentum parameters of the algorithm come into play only in the weight update phase. The preceding three equations represent a single iteration of the backpropagation algorithm. A large number of iterations must be performed until the overall error in the ANN converges to a small value. We can now summarize the backpropagation learning algorithm using the following steps:
There are several distinct parts in the backpropagation learning algorithm, and we will now implement each part and combine it into a complete implementation. As the deltas and weights of the synapses and activations in an ANN can be represented by matrices, we can write a vectorized implementation of this algorithm.
Let's assume that we need to implement an ANN to model a logical XOR gate. The sample data is simply the truth table of the XOR gate and can be represented as a vector, shown as follows:
;; truth table for XOR logic gate (def sample-data [[[0 0] [0]] [[0 1] [1]] [[1 0] [1]] [[1 1] [0]]])
Each element defined in the preceding vector sample-data
is itself a vector comprising other vectors for the input and output values of an XOR gate. We will use this vector as our training data for building an ANN. This is essentially a classification problem, and we will use ANNs to model it. In abstract terms, an ANN should be capable of performing both binary and multiclass classifications. We can define the protocol of an ANN as follows:
(defprotocol NeuralNetwork (run [network inputs]) (run-binary [network inputs]) (train-ann [network samples]))
The NeuralNetwork
protocol defined in the preceding code has three functions. The train-ann
function can be used to train the ANN and requires some sample data. The run
and run-binary
functions can be used on this ANN to perform multiclass and binary classifications, respectively. Both the run
and run-binary
functions require a set of input values.
The first step of the backpropagation algorithm is the initialization of the weights of the synapses of the ANN. We can use the rand
and matrix
functions to generate these weights as a matrix, shown as follows:
(defn rand-list "Create a list of random doubles between -epsilon and +epsilon." [len epsilon] (map (fn [x] (- (rand (* 2 epsilon)) epsilon)) (range 0 len))) (defn random-initial-weights "Generate random initial weight matrices for given layers. layers must be a vector of the sizes of the layers." [layers epsilon] (for [i (range 0 (dec (length layers)))] (let [cols (inc (get layers i)) rows (get layers (inc i))] (matrix (rand-list (* rows cols) epsilon) cols))))
The rand-list
function shown in the preceding code creates a list of random elements in the positive and negative range of epsilon
. As we described earlier, we choose this range to break the symmetry of the weight matrix.
The random-initial-weights
function generates several weight matrices for different layers of the ANN. As defined in the preceding code, the layers
argument must be a vector of the sizes of the layers of the ANN. For an ANN with two nodes in the input layer, three nodes in the hidden layer, and one node in the output layer, we pass layers
as [2 3 1]
to the random-initial-weights
function. Each weight matrix has a number of columns equal to the number of inputs and number of rows equal to the number of nodes in the next layer of the ANN. We set the number of columns in a weight matrix of a given layer to the number of inputs, plus an extra input for the bias of the neural layer. Note that we use a slightly different form of the matrix
function. This form takes a single vector and partitions this vector into a matrix that has a number of columns as specified by second argument to this function. Thus, the vector passed to this form of the matrix
function must have (* rows cols)
elements, where rows
and cols
are the number of rows and columns, respectively, in the weight matrix.
As we will need to apply the sigmoid function to all the activations of a layer in the ANN, we must define a function that applies the sigmoid function on all the elements in a given matrix. We can use the div
, plus
, exp
, and minus
functions from the incanter.core
namespace to implement such a function, as shown in the following code:
(defn sigmoid "Apply the sigmoid function 1/(1+exp(-z)) to all elements in the matrix z." [z] (div 1 (plus 1 (exp (minus z)))))
We will also need to implicitly add a bias node to each layer in an ANN. This can be done by wrapping around the bind-rows
function, which adds a row of elements to a matrix, as shown in the following code:
(defn bind-bias "Add the bias input to a vector of inputs." [v] (bind-rows [1] v))
Since the bias value is always 1, we specify the row of elements as [1]
to the bind-rows
function.
Using the functions defined earlier, we can implement forward propagation. We essentially have to multiply the weights of a given synapse between two layers in an ANN and then apply the sigmoid function on each of the generated activation values, as shown in the following code:
(defn matrix-mult "Multiply two matrices and ensure the result is also a matrix." [a b] (let [result (mmult a b)] (if (matrix? result) result (matrix [result])))) (defn forward-propagate-layer "Calculate activations for layer l+1 given weight matrix of the synapse between layer l and l+1 and layer l activations." [weights activations] (sigmoid (matrix-mult weights activations))) (defn forward-propagate "Propagate activation values through a network's weight matrix and return output layer activation values." [weights input-activations] (reduce #(forward-propagate-layer %2 (bind-bias %1)) input-activations weights))
In the preceding code, we first define a matrix-mult
function, which performs matrix multiplication and ensures that the result is a matrix. Note that to define matrix-mult
, we use the mmult
function instead of the mult
function that multiplies the corresponding elements in two matrices of the same size.
Using the matrix-mult
and sigmoid
functions, we can implement the forward propagation step between two layers in the ANN. This is done in the forward-propagate-layer
function, which simply multiplies the matrices representing the weights of the synapse between two layers in the ANN and the input activation values while ensuring that the returned value is always a matrix. To propagate a given set of values through all the layers of an ANN, we must add a bias input and apply the forward-propagate-layer
function for each layer. This can be done concisely using the reduce
function over a closure of the forward-propagate-layer
function as shown in the forward-propagate
function defined in the preceding code.
Although the forward-propagate
function can determine the output activations of the ANN, we actually require the activations of all the nodes in the ANN to use backpropagation. We can do this by translating the reduce
function to a recursive function and introducing an accumulator variable to store the activations of every layer in the ANN. The forward-propagate-all-activations
function, which is defined in the following code, implements this idea and uses the loop
form to recursively apply the forward-propagate-layer
function:
(defn forward-propagate-all-activations "Propagate activation values through the network and return all activation values for all nodes." [weights input-activations] (loop [all-weights weights activations (bind-bias input-activations) all-activations [activations]] (let [[weights & all-weights'] all-weights last-iter? (empty? all-weights') out-activations (forward-propagate-layer weights activations) activations' (if last-iter? out-activations (bind-bias out-activations)) all-activations' (conj all-activations activations')] (if last-iter? all-activations' (recur all-weights' activations' all-activations')))))
The forward-propagate-all-activations
function defined in the preceding code requires all the weights of the nodes in the ANN and the input values to pass through the ANN as activation values. We first use the bind-bias
function to add the bias input to the input activations of the ANN. We then store this value in an accumulator, that is, the variable all-activations
, as a vector of all the activations in the ANN. The forward-propagate-layer
function is then applied over the weight matrices of the various layers of the ANN, and each iteration adds a bias input to the input activations of the corresponding layer in the ANN.
Note that we do not add the bias input in the last iteration as it computes the output layer of the ANN. Thus, the forward-propagate-all-activations
function applies forward propagation of input values through an ANN and returns the activations of every node in the ANN. Note that the activation values in this vector are in the order of the layers of the ANN.
We will now implement the backpropagation phase of the backpropagation learning algorithm. First, we would have to implement a function to calculate the error term from the equation . We will do this with the help of the following code:
(defn back-propagate-layer "Back propagate deltas (from layer l+1) and return layer l deltas." [deltas weights layer-activations] (mult (matrix-mult (trans weights) deltas) (mult layer-activations (minus 1 layer-activations))))
The back-propagate-layer
function defined in the preceding code calculates the errors, or deltas, of a synapse layer l in the ANN from the weights of the layer and the deltas of the next layer in the ANN.
Essentially, we have to apply this function from the output layer to the input layer through the various hidden layers of an ANN to produce the delta values of every node in the ANN. These delta values can then be added to the activations of the nodes, thus producing the gradient values by which we must adjust the weights of the nodes in the ANN. We can do this in a manner similar to the forward-propagate-all-activations
function, that is, by recursively applying the back-propagate-layer
function over the various layers of the ANN. Of course, we have to traverse the layers of the ANN in the reverse order, that is, starting from the output layer, through the hidden layers, to the input layer. We will do this with the help of the following code:
(defn calc-deltas "Calculate hidden deltas for back propagation. Returns all deltas including output-deltas." [weights activations output-deltas] (let [hidden-weights (reverse (rest weights)) hidden-activations (rest (reverse (rest activations)))] (loop [deltas output-deltas all-weights hidden-weights all-activations hidden-activations all-deltas (list output-deltas)] (if (empty? all-weights) all-deltas (let [[weights & all-weights'] all-weights [activations & all-activations'] all-activations deltas' (back-propagate-layer deltas weights activations) all-deltas' (cons (rest deltas') all-deltas)] (recur deltas' all-weights' all-activations' all-deltas'))))))
The calc-deltas
function determines the delta values of all the perceptron nodes in the ANN. For this calculation, the input and output activations are not needed. Only the hidden activations, bound to the hidden-activations
variable, are needed to calculate the delta values. Also, the weights of the input layer are skipped as they are bound to the hidden-weights
variable. The calc-deltas
function then applies the back-propagate-layer
function to all the weight matrices of each synapse layer in the ANN, thus determining the deltas of all the nodes in the matrix. Note that we don't add the delta of the bias nodes to a computed set of deltas. This is done using the rest
function, (rest deltas')
, on the calculated deltas of a given synapse layer, as the first delta is that of a bias input in a given layer.
By definition, the gradient vector terms for a given synapse layer are determined by multiplying the matrices and , which represent the deltas of the next layer and activations of the given layer respectively. We will do this with the help of the following code:
(defn calc-gradients "Calculate gradients from deltas and activations." [deltas activations] (map #(mmult %1 (trans %2)) deltas activations))
The calc-gradients
function shown in the preceding code is a concise implementation of the term . As we will be dealing with a sequence of delta and activation terms, we use the map
function to apply the preceding equality to the corresponding deltas and activations in the ANN. Using the calc-deltas
and calc-gradient
functions, we can determine the total error in the weights of all nodes in the ANN for a given training sample. We will do this with the help of the following code:
(defn calc-error "Calculate deltas and squared error for given weights." [weights [input expected-output]] (let [activations (forward-propagate-all-activations weights (matrix input)) output (last activations) output-deltas (minus output expected-output) all-deltas (calc-deltas weights activations output-deltas) gradients (calc-gradients all-deltas activations)] (list gradients (sum (pow output-deltas 2)))))
The calc-error
function defined in the preceding code requires two parameters—the weight matrices of the synapse layers in the ANN and a sample training value, which is shown as [input expected-output]
. The activations of all the nodes in the ANN are first calculated using the forward-propagate-all-activations
function, and the delta value of the last layer is calculated as the difference of the expected output value and the actual output value produced by the ANN. The output value calculated by the ANN is simply the last activation value produced by the ANN, shown as (last activations)
in the preceding code. Using the calculated activations, the deltas of all the perceptron nodes are determined via the calc-deltas
function. These delta values are in turn used to determine the gradients of weights in the various layers of the ANN using the calc-gradients
function. The Mean Square Error (MSE) of the ANN for the given sample value is also calculated by adding the squares of the delta values of the output layer of the ANN.
For a given weight matrix of a layer in the ANN, we must initialize the gradients for the layer as a matrix with the same dimensions as the weight matrix, and all the elements in the gradient matrix must be set to 0
. This can be implemented using a composition of the dim
function, which returns the size of a matrix as a vector, and a variant form of the matrix
function, as shown in the following code:
(defn new-gradient-matrix "Create accumulator matrix of gradients with the same structure as the given weight matrix with all elements set to 0." [weight-matrix] (let [[rows cols] (dim weight-matrix)] (matrix 0 rows cols)))
In the new-gradient-matrix
function defined in the preceding code, the matrix
function expects a value, the number of rows and the number of columns to initialize a matrix. This function produces an initialized gradient matrix with the same structure as the supplied weight matrix.
We now implement the calc-gradients-and-error
function to apply the calc-error
function on a set of weight matrices and sample values. We must basically apply the calc-error
function to each sample and accumulate the sum of the gradient and the MSE values. We then calculate the average of these accumulated values to return the gradient matrices and total MSE for the given sample values and weight matrices. We will do this with the help of the following code:
(defn calc-gradients-and-error' [weights samples] (loop [gradients (map new-gradient-matrix weights) total-error 1 samples samples] (let [[sample & samples'] samples [new-gradients squared-error] (calc-error weights sample) gradients' (map plus new-gradients gradients) total-error' (+ total-error squared-error)] (if (empty? samples') (list gradients' total-error') (recur gradients' total-error' samples'))))) (defn calc-gradients-and-error "Calculate gradients and MSE for sample set and weight matrix." [weights samples] (let [num-samples (length samples) [gradients total-error] (calc-gradients-and-error' weights samples)] (list (map #(div % num-samples) gradients) ; gradients (/ total-error num-samples)))) ; MSE
The calc-gradients-and-error
function defined in the preceding code relies on the calc-gradients-and-error'
helper function. The calc-gradients-and-error'
function initializes the gradient matrices, performs the application of the calc-error
function, and accumulates the calculated gradient values and MSE. The calc-gradients-and-error
function simply calculates the average of the accumulated gradient matrices and MSE returned from the calc-gradients-and-error'
function.
Now, the only missing piece in our implementation is modifying the weights of the nodes in the ANN using calculated gradients. In brief, we must repeatedly update the weights until a convergence in the MSE is observed. This is actually a form of gradient descent applied to the nodes of an ANN. We will now implement this variant of gradient descent in order to train the ANN by repeatedly modifying the weights of the nodes in the ANN, as shown in the following code:
(defn gradient-descent-complete? "Returns true if gradient descent is complete." [network iter mse] (let [options (:options network)] (or (>= iter (:max-iters options)) (< mse (:desired-error options)))))
The gradient-descent-complete?
function defined in the preceding code simply checks for the termination condition of gradient descent. This function assumes that the ANN, represented as a network, is a map or record that contains the :options
keyword. The value of this key is in turn another map that contains the various configuration options of the ANN. The gradient-descent-complete?
function checks whether the total MSE of the ANN is less than the desired MSE, which is specified by the :desired-error
option. Also, we add another condition to check if the number of iterations performed exceeds the maximum number of iterations specified by the :max-iters
option.
Now, we will implement a gradient-descent
function for multilayer perceptron ANNs. In this implementation, the changes in weights are calculated by the step
function provided by the gradient descent algorithm. These calculated changes are then simply added to the existing weights of the synapse layers of the ANN. We will implement the gradient-descent
function for multilayer perceptron ANNs with the help of the following code:
(defn apply-weight-changes "Applies changes to corresponding weights." [weights changes] (map plus weights changes)) (defn gradient-descent "Perform gradient descent to adjust network weights." [step-fn init-state network samples] (loop [network network state init-state iter 0] (let [iter (inc iter) weights (:weights network) [gradients mse] (calc-gradients-and-error weights samples)] (if (gradient-descent-complete? network iter mse) network (let [[changes state] (step-fn network gradients state) new-weights (apply-weight-changes weights changes) network (assoc network :weights new-weights)] (recur network state iter))))))
The apply-weight-changes
function defined in the preceding code simply adds the weights and the calculated changes in the weights of the ANN. The gradient-descent
function requires a step
function (specified as step-fn
), the initial state of the ANN, the ANN itself, and the sample data to train the ANN. This function must calculate the weight changes from the ANN, the initial gradient matrices, and the initial state of the ANN. The step-fn
function also returns the changed state of the ANN. The weights of the ANN are then updated using the apply-weight-changes
function, and this iteration is repeatedly performed until the gradient-descent-complete?
function returns as true
. The weights of the ANN are specified by the :weights
keyword in the network
map. These weights are then updated by simply overwriting the value on the network
specified by the :weights
keyword.
In the context of the backpropagation algorithm, we need to specify the learning rate and learning momentum by which the ANN must be trained. These parameters are needed to determine the changes in the weights of the nodes in the ANN. A function implementing this calculation must then be specified as the step-fn
parameter to the gradient-descent
function, as shown in the following code:
(defn calc-weight-changes "Calculate weight changes: changes = learning rate * gradients + learning momentum * deltas." [gradients deltas learning-rate learning-momentum] (map #(plus (mult learning-rate %1) (mult learning-momentum %2)) gradients deltas)) (defn bprop-step-fn [network gradients deltas] (let [options (:options network) learning-rate (:learning-rate options) learning-momentum (:learning-momentum options) changes (calc-weight-changes gradients deltas learning-rate learning-momentum)] [(map minus changes) changes])) (defn gradient-descent-bprop [network samples] (let [gradients (map new-gradient-matrix (:weights network))] (gradient-descent bprop-step-fn gradients network samples)))
The calc-weight-changes
function defined in the preceding code calculates the change of weights, termed as , from the gradient values and deltas of a given layer in the ANN. The bprop-step-fn
function extracts the learning rate and learning momentum parameters from the ANN that is represented by network
and uses the calc-weight-changes
function. As the weights will be added with the changes by the gradient-descent
function, we return the changes in weights as negative values using the minus
function.
The gradient-descent-bprop
function simply initializes the gradient matrices for the given weights of the ANN and calls the gradient-descent
function by specifying bprop-step-fn
as the step
function to be used. Using the gradient-descent-bprop
function, we can implement the abstract NeuralNetwork
protocol we had defined earlier, as follows:
(defn round-output "Round outputs to nearest integer." [output] (mapv #(Math/round ^Double %) output)) (defrecord MultiLayerPerceptron [options] NeuralNetwork ;; Calculates the output values for the given inputs. (run [network inputs] (let [weights (:weights network) input-activations (matrix inputs)] (forward-propagate weights input-activations))) ;; Rounds the output values to binary values for ;; the given inputs. (run-binary [network inputs] (round-output (run network inputs))) ;; Trains a multilayer perceptron ANN from sample data. (train-ann [network samples] (let [options (:options network) hidden-neurons (:hidden-neurons options) epsilon (:weight-epsilon options) [first-in first-out] (first samples) num-inputs (length first-in) num-outputs (length first-out) sample-matrix (map #(list (matrix (first %)) (matrix (second %))) samples) layer-sizes (conj (vec (cons num-inputs hidden-neurons)) num-outputs) new-weights (random-initial-weights layer-sizes epsilon) network (assoc network :weights new-weights)] (gradient-descent-bprop network sample-matrix))))
The MultiLayerPerceptron
record defined in the preceding code trains a multilayer perceptron ANN using the gradient-descent-bprop
function. The train-ann
function first extracts the values for the number of hidden neurons and the constant from the options map specified to the ANN. The sizes of the various synapse layers in the ANN are first determined from the sample data and bound to the layer-sizes
variable. The weights of the ANN are then initialized using the random-initial-weights
function and updated in the record network
using the assoc
function. Finally, the gradient-descent-bprop
function is called to train the ANN using the backpropagation learning algorithm.
The ANN defined by the MultiLayerPerceptron
record also implements two other functions, run
and run-binary
, from the NeuralNetwork
protocol. The run
function uses the forward-propagate
function to determine the output values of a trained MultiLayerPerceptron
ANN. The
run-binary
function simply rounds the value of the output returned by the run
function for the given set of input values.
An ANN created using the MultiLayerPerceptron
record requires a single options
parameter containing the various options we can specify for the ANN. We can define the default options for such an ANN as follows:
(def default-options {:max-iters 100 :desired-error 0.20 :hidden-neurons [3] :learning-rate 0.3 :learning-momentum 0.01 :weight-epsilon 50}) (defn train [samples] (let [network (MultiLayerPerceptron. default-options)] (train-ann network samples)))
The map defined by the default-options
variable contains the following keys that specify the options for the MultiLayerPerceptron
ANN:
:max-iter
: This key specifies the maximum number of iterations to run the gradient-descent
function.:desired-error
: This variable specifies the expected or acceptable MSE in the ANN.:hidden-neurons
: This variable specifies the number of hidden neural nodes in the network. The value [3]
represents a single hidden layer with three neurons.:learning-rate
and :learning-momentum
: These keys specify the learning rate and learning momentum for the weight update phase of the backpropagation learning algorithm.:epsilon
: This variable specifies the constant used by the random-initial-weights
function to initialize the weights of the ANN.We also define a simple helper function train
to create an ANN of the MultiLayerPerceptron
type and train the ANN using the train-ann
function and the sample data specified by the samples
parameter. We can now create a trained ANN from the training data specified by the sample-data
variable as follows:
user> (def MLP (train sample-data)) #'user/MLP
We can then use the trained ANN to predict the output of some input values. The output generated by the ANN defined by MLP
closely matches the output of an XOR gate as follows:
user> (run-binary MLP [0 1]) [1] user> (run-binary MLP [1 0]) [1]
However, the trained ANN produces incorrect outputs for some set of inputs as follows:
user> (run-binary MLP [0 0]) [0] user> (run-binary MLP [1 1]) ;; incorrect output generated [1]
There are several measures we can implement in order to improve the accuracy of the trained ANN. First, we can regularize the calculated gradients using the weights matrices of the ANN. This modification will produce a noticeable improvement in the preceding implementation. We can also increase the maximum number of iterations to be performed. We can also tune the algorithm to perform better by tweaking the learning rate, the learning momentum, and the number of hidden nodes in the ANN. These modifications are skipped as they have to be done by the reader.
The Enclog library (http://github.com/jimpil/enclog) is a Clojure wrapper library for the Encog library for machine learning algorithms and ANNs. The Encog library (http://github.com/encog) has two primary implementations: one in Java and one in .NET. We can use the Enclog library to easily generate customized ANNs to model both supervised and unsupervised machine learning problems.
The Enclog library can be added to a Leiningen project by adding the following dependency to the project.clj
file:
[org.encog/encog-core "3.1.0"] [enclog "0.6.3"]
Note that the Enclog library requires the Encog Java library as a dependency.
For the example that will follow, the namespace declaration should look similar to the following declaration:
(ns my-namespace (:use [enclog nnets training]))
We can create an ANN from the Enclog library using the neural-pattern
and network
functions from the enclog.nnets
namespace. The neural-pattern
function is used to specify a neural network model for the ANN. The network
function accepts a neural network model returned from the neural-pattern
function and creates a new ANN. We can provide several options to the network
function depending on the specified neural network model. A feed-forward multilayer perceptron network is defined as follows:
(def mlp (network (neural-pattern :feed-forward) :activation :sigmoid :input 2 :output 1 :hidden [3]))
For a feed-forward neural network, we can specify the activation function with the :activation
key to the network
function. For our example, we used the sigmoid function, which is specified as :sigmoid
, as the activation function for the ANNs nodes. We also specified the number of nodes in the input, output, and hidden layers of the ANN using the :input
, :output
, and :hidden
keys.
To train an ANN created by the network
function with some sample data, we use the trainer
and train
functions from the enclog.training
namespace. The learning algorithm to be used to train the ANN must be specified as the first parameter to the trainer
function. For the backpropagation algorithm, this parameter is the :back-prop
keyword. The value returned by the trainer function represents an ANN as well as the learning algorithm to be used to train the ANN. The train
function is then used to actually run the specified training algorithm on the ANN. We will do this with the help of the following code:
(defn train-network [network data trainer-algo] (let [trainer (trainer trainer-algo :network network :training-set data)] (train trainer 0.01 1000 []))) ;; 0.01 is the expected error
The train-network
function defined in the preceding code takes three parameters. The first parameter is an ANN created by the network function, the second parameter is the training data to be used to train the ANN, and the third parameter specifies the learning algorithm by which the ANN must be trained. As shown in the preceding code, we can specify the ANN and the training data to the trainer
function using the key parameters, :network
and :training-set
. The train
function is then used to run the training algorithm on the ANN using the sample data. We can specify the expected error in the ANN and the maximum number of iterations to run the training algorithm as the first and second parameters to the train
function. In the preceding example, the desired error is 0.01
, and the maximum number of iterations is 1000. The last parameter passed to the train
function is a vector specifying the behaviors of the ANN, and we ignore it by passing it as an empty vector.
The training data to be used to run the training algorithm on the ANN can be created using Enclog's data
function. For example, we can create a training data for a logical XOR gate using the data
function as follows:
(def dataset (let [xor-input [[0.0 0.0] [1.0 0.0] [0.0 1.0] [1.0 1.0]] xor-ideal [[0.0] [1.0] [1.0] [0.0]]] (data :basic-dataset xor-input xor-ideal)))
The data
function requires the type of data as the first parameter of the function, followed by the input and output values of the training data as vectors. For our example, we will use the :basic-dataset
and :basic
parameters. The :basic-dataset
keyword can be used to create training data, and the :basic
keyword can be used to specify a set of input values.
Using the data defined by the dataset
variable and the train-network
function, we can train the ANN's MLP
to model the output of an XOR gate as follows:
user> (def MLP (train-network mlp dataset :back-prop)) Iteration # 1 Error: 26.461526% Target-Error: 1.000000% Iteration # 2 Error: 25.198031% Target-Error: 1.000000% Iteration # 3 Error: 25.122343% Target-Error: 1.000000% Iteration # 4 Error: 25.179218% Target-Error: 1.000000% ... ... Iteration # 999 Error: 3.182540% Target-Error: 1.000000% Iteration # 1,000 Error: 3.166906% Target-Error: 1.000000% #'user/MLP
As shown by the preceding output, the trained ANN has an error of about 3.16 percent. We can now use the trained ANN to predict the output of a set of input values. To do this, we use the Java compute
and getData
methods, which are specified by .compute
and .getData
respectively. We can define a simple helper function to call the .compute
method for a vector of input values and round the output to a binary value as follows:
(defn run-network [network input] (let [input-data (data :basic input) output (.compute network input-data) output-vec (.getData output)] (round-output output-vec)))
We can now use the run-network
function to test the trained ANN using a vector of input values, as follows:
user> (run-network MLP [1 1]) [0] user> (run-network MLP [1 0]) [1] user> (run-network MLP [0 1]) [1] user> (run-network MLP [0 0]) [0]
As shown in the preceding code, the trained ANN represented by MLP
completely matches the behavior of an XOR gate.
In conclusion, the Enclog library gives us a small set of powerful functions that can be used to build ANNs. In the preceding example, we explored a feed-forward multilayer perceptron model. The library provides several other ANN models, such as Adaptive Resonance Theory (ART), Self-Organizing Maps (SOM), and Elman networks. The Enclog library also allows us to customize the activation function of the nodes in a particular neural network model. For the feed-forward network in our example, we've used the sigmoid function. Several mathematical functions, such as sine, hyperbolic tan, logarithmic, and linear functions, are also supported by the library. There are also several machine learning algorithms supported by the Enclog library that can be used to train an ANN.