To perform the XOR gate operation, we build a simple two-layer neural network, as shown in the following diagram. As you can see, we have an input layer with two nodes: a hidden layer with five nodes and an output layer comprising one node:
We will understand step-by-step how a neural network learns the XOR logic:
- First, import the libraries:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
- Prepare the data as shown in the preceding XOR table:
X = np.array([ [0, 1], [1, 0], [1, 1],[0, 0] ])
y = p.array([ [1], [1], [0], [0]])
- Define the number of nodes in each layer:
num_input = 2
num_hidden = 5
num_output = 1
- Initialize the weights and bias randomly. First, we initialize the input to hidden layer weights:
Wxh = np.random.randn(num_input,num_hidden)
bh = np.zeros((1,num_hidden))
- Now, we initialize the hidden to output layer weights:
Why = np.random.randn (num_hidden,num_output)
by = np.zeros((1,num_output))
- Define the sigmoid activation function:
def sigmoid(z):
return 1 / (1+np.exp(-z))
- Define the derivative of the sigmoid function:
def sigmoid_derivative(z):
return np.exp(-z)/((1+np.exp(-z))**2)
- Define the forward propagation:
def forward_prop(X,Wxh,Why):
z1 = np.dot(X,Wxh) + bh
a1 = sigmoid(z1)
z2 = np.dot(a1,Why) + by
y_hat = sigmoid(z2)
return z1,a1,z2,y_hat
- Define the backward propagation:
def backword_prop(y_hat, z1, a1, z2):
delta2 = np.multiply(-(y-y_hat),sigmoid_derivative(z2))
dJ_dWhy = np.dot(a1.T, delta2)
delta1 = np.dot(delta2,Why.T)*sigmoid_derivative(z1)
dJ_dWxh = np.dot(X.T, delta1)
return dJ_dWxh, dJ_dWhy
- Define the cost function:
def cost_function(y, y_hat):
J = 0.5*sum((y-y_hat)**2)
return J
- Set the learning rate and the number of training iterations:
alpha = 0.01
num_iterations = 5000
- Now, let's start training the network with the following code:
cost =[]
for i in range(num_iterations):
z1,a1,z2,y_hat = forward_prop(X,Wxh,Why)
dJ_dWxh, dJ_dWhy = backword_prop(y_hat, z1, a1, z2)
#update weights
Wxh = Wxh -alpha * dJ_dWxh
Why = Why -alpha * dJ_dWhy
#compute cost
c = cost_function(y, y_hat)
cost.append(c)
- Plot the cost function:
plt.grid()
plt.plot(range(num_iteratins),cost)
plt.title('Cost Function')
plt.xlabel('Training Iterations')
plt.ylabel('Cost')
As you can observe in the following plot, the loss decreases over the training iterations:
Thus, in this chapter, we got an overall understanding of artificial neural network and how they learn.