A basic neural architecture – perceptrons

Perceptron is the most simple neural network architecture. Projected by Frank Rosenblatt in 1957, it has just one layer of neurons, receiving a set of inputs and producing another set of outputs. This was one of the first representations of neural networks to gain attention, especially because of their simplicity:

A basic neural architecture – perceptrons

In our Java implementation, this is illustrated with one neural layer (the output layer). The following code creates a perceptron with three inputs and two outputs, having the linear function at the output layer:

int numberOfInputs=3;
int numberOfOutputs=2;

Linear outputAcFnc = new Linear(1.0);
NeuralNet perceptron = new NeuralNet(numberOfInputs,numberOfOutputs,
                outputAcFnc);

Applications and limitations

However, scientists did not take long to conclude that a perceptron neural network could only be applied to simple tasks, according to that simplicity. At that time, neural networks were being used for simple classification problems, but perceptrons usually failed when faced with more complex datasets. Let's illustrate this with a very basic example (an AND function) to understand better this issue.

Linear separation

The example consists of an AND function that takes two inputs, x1 and x2. That function can be plotted in a two-dimensional chart as follows:

Linear separation

And now let's examine how the neural network evolves the training using the perceptron rule, considering a pair of two weights, w1 and w2, initially 0.5, and bias valued 0.5 as well. Assume learning rate ? equals 0.2:

Epoch

x1

x2

w1

w2

b

y

t

E

?w1

?w2

?b

1

0

0

0.5

0.5

0.5

0.5

0

-0.5

0

0

-0.1

1

0

1

0.5

0.5

0.4

0.9

0

-0.9

0

-0.18

-0.18

1

1

0

0.5

0.32

0.22

0.72

0

-0.72

-0.144

0

-0.144

1

1

1

0.356

0.32

0.076

0.752

1

0.248

0.0496

0.0496

0.0496

2

0

0

0.406

0.370

0.126

0.126

0

-0.126

0.000

0.000

-0.025

2

0

1

0.406

0.370

0.100

0.470

0

-0.470

0.000

-0.094

-0.094

2

1

0

0.406

0.276

0.006

0.412

0

-0.412

-0.082

0.000

-0.082

2

1

1

0.323

0.276

-0.076

0.523

1

0.477

0.095

0.095

0.095

          

89

0

0

0.625

0.562

-0.312

-0.312

0

0.312

0

0

0.062

89

0

1

0.625

0.562

-0.25

0.313

0

-0.313

0

-0.063

-0.063

89

1

0

0.625

0.500

-0.312

0.313

0

-0.313

-0.063

0

-0.063

89

1

1

0.562

0.500

-0.375

0.687

1

0.313

0.063

0.063

0.063

After 89 epochs, we find the network to produce values near to the desired output. Since in this example the outputs are binary (zero or one), we can assume that any value produced by the network that is below 0.5 is considered to be 0 and any value above 0.5 is considered to be 1. So, we can draw a function Linear separation, with the final weights and bias found by the learning algorithm w1=0.562, w2=0.5 and b=-0.375, defining the linear boundary in the chart:

Linear separation

This boundary is a definition of all classifications given by the network. You can see that the boundary is linear, given that the function is also linear. Thus, the perceptron network is really suitable for problems whose patterns are linearly separable.

The XOR case

Now let's analyze the XOR case:

The XOR case

We see that in two dimensions, it is impossible to draw a line to separate the two patterns. What would happen if we tried to train a single layer perceptron to learn this function? Suppose we tried, let's see what happened in the following table:

Epoch

x1

x2

w1

w2

b

y

t

E

?w1

?w2

?b

1

0

0

0.5

0.5

0.5

0.5

0

-0.5

0

0

-0.1

1

0

1

0.5

0.5

0.4

0.9

1

0.1

0

0.02

0.02

1

1

0

0.5

0.52

0.42

0.92

1

0.08

0.016

0

0.016

1

1

1

0.516

0.52

0.436

1.472

0

-1.472

-0.294

-0.294

-0.294

2

0

0

0.222

0.226

0.142

0.142

0

-0.142

0.000

0.000

-0.028

2

0

1

0.222

0.226

0.113

0.339

1

0.661

0.000

0.132

0.132

2

1

0

0.222

0.358

0.246

0.467

1

0.533

0.107

0.000

0.107

2

1

1

0.328

0.358

0.352

1.038

0

-1.038

-0.208

-0.208

-0.208

          

127

0

0

-0.250

-0.125

0.625

0.625

0

-0.625

0.000

0.000

-0.125

127

0

1

-0.250

-0.125

0.500

0.375

1

0.625

0.000

0.125

0.125

127

1

0

-0.250

0.000

0.625

0.375

1

0.625

0.125

0.000

0.125

127

1

1

-0.125

0.000

0.750

0.625

0

-0.625

-0.125

-0.125

-0.125

The perceptron just could not find any pair of weights that would drive the following error 0.625. This can be explained mathematically as we already perceived from the chart that this function cannot be linearly separable in two dimensions. So what if we add another dimension? Let's see the chart in three dimensions:

The XOR case

In three dimensions, it is possible to draw a plane that would separate the patterns, provided that this additional dimension could properly transform the input data. Okay, but now there is an additional problem: how could we derive this additional dimension since we have only two input variables? One obvious, but also workaround, answer would be adding a third variable as a derivation from the two original ones. And being this third variable a (derivation), our neural network would probably get the following shape:

The XOR case

Okay, now the perceptron has three inputs, one of them being a composition of the other. This also leads to a new question: how should that composition be processed? We can see that this component could act as a neuron, so giving the neural network a nested architecture. If so, there would another new question: how would the weights of this new neuron be trained, since the error is on the output neuron?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset