Multilayered neural networks

A single layered non-linear unit still has a limited capacity for the input output transformations it can learn. It can be explained by looking at the XOR problem. In the XOR problem, we want a neural network model to learn the XOR function. The XOR function takes two Boolean input, and outputs 1 if they differ, or 0 if the input is identical.

We can think of it as a pattern-classification problem with input patterns of X = {(0,0), (0, 1), (1, 0), (1,1 )}. The first and fourth are in class 0 and the others in class 1. Let's treat this problem as a regression problem, with Mean Square Error (MSE) loss and try to model this with a linear unit. Solving it analytically, we arrive at the desired weights :  and bias: . This model outputs 0.5 for all input values. Thus, the XOR function cannot be learned by a simple linear neuron. 

One way to solve the XOR problem is to use a different representation of the input such that the linear model is able to find a solution. This can be achieved by adding a non-linear hidden layer to the network. We will use a ReLU layer with two hidden units. The output is Boolean so the most suitable output neuron is a logistic unit. We can use binary cross-entropy loss to learn the weights:

Let's learn the weights of this network using SGD. The following is the keras code for the XOR function learning problem:

model_input = Input(shape=(2,), dtype='float32')
z = Dense(2,name='HiddenLayer', kernel_initializer='ones')(model_input)
z = Activation('relu')(z) #hidden activation ReLu
z = Dense(1, name='OutputLayer')(z)
model_output = Activation('sigmoid')(z) #Output activation
model = Model(model_input, model_output)
model.summary()

#Compile model with SGD optimization, with learning rate = 0.5
sgd = SGD(lr=0.5)
model.compile(loss="binary_crossentropy", optimizer=sgd)
#The data set is very small - will use full batch - setting batch size = 4
model.fit(X, y, batch_size=4, epochs=300,verbose=0)
#Output of model
preds = np.round(model.predict(X),decimals=3)
pd.DataFrame({'Y_actual':list(y), 'Predictions':list(preds)})

The output of the preceding code is as follows:

(Left) Shows the original space of 4 points—clearly no line can separate class 0 points {(0,0),(1,1)} from other class. (Center) Shows the transformed space learned by the hidden ReLU layer. (Right) The table shows the predicted values obtained by the function

The neural network with one hidden layer is able to learn the XOR function. This example explains the need for non-linear hidden layers for neural networks to do something meaningful. Let's take a closer look at what input transformation the hidden layer has learned that makes it possible for the output-logistic neuron to learn the function. In Keras, we can extract an intermediate hidden layer from a learned model and use it to extract the transformation of input it's doing before passing to the output layer. The preceding figure shows how the the input spaces of four points are transformed. Post-transformation, the class 1 and class 0 points can be easily separated by a line. Here is the code to generate the plot for the original space and the transformed space:

import matplotlib.pyplot as plt

#Extract intermediate Layer function from Model
hidden_layer_output = Model(inputs=model.input, outputs=model.get_layer('HiddenLayer').output)
projection = hidden_layer_output.predict(X) #use predict function to
extract the transformations
#Plotting the transformed input

fig = plt.figure(figsize=(5,10))
ax = fig.add_subplot(211)
plt.scatter(x=projection[:, 0], y=projection[:, 1], c=('g'))

By stacking multiple non-linear hidden layers, we can build networks that are capable of learning very complex non-linear input-output transformations. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset