Chapter 9. Artificial Neural Networks

The popularity of neural networks surged in the 90s. They were seen as the silver bullet to a vast number of problems. At its core, a neural network is a nonlinear statistical model that leverages the logistic regression to create a nonlinear distributed model. The concept of artificial neural networks is rooted in biology, with the desire to simulate key functions of the brain and replicate its structure in terms of neurons, activation, and synapses.

In this chapter, you will move beyond the hype and learn:

  • The concept and elements of the multilayer perceptron (MLP)
  • How to train a neural network using error backpropagation
  • The evaluation and tuning of MLP configuration parameters
  • Full Scala implementation of the MLP classifier
  • How to apply MLP to extract correlation models for currency exchange rates

Feed-forward neural networks (FFNN)

The idea behind artificial neural networks was to build mathematical and computational models of the natural neural network in the brain. After all, the brain is a very powerful information processing engine that surpasses computers in domains such as learning, inductive reasoning, prediction and vision, and speech recognition.

The Biological background

In biology, a neural network is composed of groups of neurons interconnected though synapses [9:1], as shown in the following image:

The Biological background

Neuroscientists have been especially interested in understanding how the billions of neurons in the brain can interact to provide human beings with parallel processing capabilities. The 60s saw a new field of study emerging, known as connectionism. Connectionism marries cognitive psychology, artificial intelligence, and neuroscience. The goal was to create a model for mental phenomena. Although there are many forms of connectionism, the neural network models have become the most popular and the most taught of all connectionism models [9:2].

Biological neurons communicate through electrical charges known as stimuli. This network of neurons can be represented as a simple schematic, as follows:

The Biological background

This representation categorizes groups of neurons as layers. The terminology used to describe the natural neural networks has a corresponding nomenclature for the artificial neural network.

The biological neural network

The artificial neuron network

Axon

Connection

Dendrite

Connection

Synapse

Weight

Potential

Weighted sum

Threshold

Bias weight

Signal, Stimulus

Activation

Group of neurons

Layer of neurons

In the biological world, stimuli do not propagate in any specific direction between neurons. An artificial neural network can have the same degree of freedom. The artificial neural networks most commonly used by data scientists, have a predefined direction: from the input layer to output layers. These neural networks are known as FFNN.

The mathematical background

In the previous chapter, you learned that support vector machines have the ability to formulate the training of a model as a nonlinear optimization for which the objective function is convex. A convex objective function is fairly straightforward to implement. The drawback is that the kernelization of the SVM may result in a large number of basis functions (or model dimensions). Refer to the The Kernel trick section under The support vector machine (SVM) in Chapter 8, Kernel Models and Support Vector Machines.

One solution is to reduce the number of basis functions through parameterization, so these functions can adapt to different training sets. Such an approach can be modeled as a FFNN, known as the multilayer perceptron [9:3].

The linear regression can be visualized as a simple connectivity model using neurons and synapses, as follows:

The mathematical background

A two-layer neural network

The feature x0=+1 is known as the bias input (or bias element), which corresponds to the intercept in the classic linear regression.

As with support vector machines, linear regression is appropriate for observations that can be linearly separable. The real world is usually driven by a nonlinear phenomena. Therefore, the logistic regression is naturally used to compute the output of the perceptron. For a set of input variable x = {xi}0,n and the weights w={wi}1,n, the output y is computed as:

The mathematical background

An FFNN can be regarded as a stack of layers of logistic regression with the output layer as a linear regression.

The value of the variables in each hidden layer is computed as the sigmoid of the dot product of the connection weights and the output of the previous layer. Although interesting, the theory behind artificial neural networks is beyond the scope of this book [9:4].

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset