Chapter 9. Artificial Neural Networks

The popularity of neural networks surged in the 90s. They were seen as the silver bullet to a vast number of problems. At its core, a neural network is a nonlinear statistical model that leverages the logistic regression to create a nonlinear distributed model. The concept of artificial neural networks is rooted in biology, with the desire to simulate key functions of the brain and replicate its structure in terms of neurons, activation, and synapses.

In this chapter, you will move beyond the hype and learn the following topics:

  • The concepts and elements of the multilayer perceptron (MLP)
  • How to train a neural network using error backpropagation
  • The evaluation and tuning of MLP configuration parameters
  • A full Scala implementation of the MLP classifier
  • How to apply MLP to extract correlation models for currency exchange rates
  • A brief introduction to convolutional neural network (CNN)

Feed-forward neural networks

The idea behind artificial neural networks was to build mathematical and computational models of the natural neural network in the brain. After all, the brain is a very powerful information processing engine that surpasses computers in domains, such as learning, inductive reasoning, prediction and vision, and speech recognition.

The biological background

In biology, a neural network is composed of groups of neurons interconnected through synapses [9:1], as shown in the following diagram:

The biological background

The visualization of biological neurons and synapses

Neuroscientists have been especially interested in understanding how billions of neurons in the brain can interact to provide human beings with parallel processing capabilities. The 60s saw a new field of study emerging, known as connectionism. Connectionism marries cognitive psychology, artificial intelligence, and neuroscience. The goal was to create a model for mental phenomena. Although there are many forms of connectionism, the neural network models have become the most popular and the most taught of all connectionism models [9:2].

Biological neurons communicate with electrical charges known as stimuli. This network of neurons can be represented as a simple schematic, as follows:

The biological background

The representation of neuron layers, connections, and synapses

This representation categorizes groups of neurons as layers. The terminology used to describe the natural neural networks has a corresponding nomenclature for the artificial neural network.

The biological neural network

The artificial neuron network

Axon

Connection

Dendrite

Connection

Synapse

Weight

Potential

Weighted sum

Threshold

Bias weight

Signal, Stimulus

Activation

Group of neurons

Layer of neurons

In the biological world, stimuli do not propagate in any specific direction between neurons. An artificial neural network can have the same degree of freedom. The most commonly used artificial neural networks by data scientists have a predefined direction: from the input layer to output layers. These neural networks are known as a feed-forward neural network (FFNN).

Mathematical background

In the previous chapter, you learned that support vector machines have the ability to formulate the training of a model as a nonlinear optimization for which the objective function is convex. A convex objective function is fairly straightforward to implement. The drawback is that the kernelization of the SVM may result in a large number of basis functions (or model dimensions). Refer to the The kernel trick section under Support vector machines in Chapter 8, Kernel Models and Support Vector Machines. One solution is to reduce the number of basis functions through parameterization, so these functions can adapt to different training sets. Such an approach can be modeled as a FFNN, known as the multilayer perceptron [9:3].

The linear regression can be visualized as a simple connectivity model using neurons and synapses, as follows:

Mathematical background

A two-layer neural network

The feature x0=+1 is known as the bias input (or the bias element), which corresponds to the intercept in the classic linear regression.

As with support vector machines, linear regression is appropriate for observations that can be linearly separable. The real world is usually driven by a nonlinear phenomenon. Therefore, the logistic regression is naturally used to compute the output of the perceptron. For a set of input variable x = {xi}0,n and the weights w={wi}1,n, the output y is computed as follows (M1):

Mathematical background

A FFNN can be regarded as a stack of layers of logistic regression with the output layer as a linear regression.

The value of the variables in each hidden layer is computed as the sigmoid of the dot product of the connection weights and the output of the previous layer. Although interesting, the theory behind artificial neural networks is beyond the scope of this book [9:4].

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset