Optimization algorithms

There is always doubt about which optimization algorithm should be used in our implementation of the neural network for a better output. This is done by modifying the key parameters, such as the weights and bias values.

These algorithms are used to minimize (or maximize) error (E(x)), which is dependent on the internal parameters. They are used for computing the target results (Y) from the set of predictors (x) that are used in the model.

Now, let's look at the different types of algorithms by using the following example:

%matplotlib inline

import torch
import torch.utils.data as Data
import torch.nn.functional as F
import matplotlib.pyplot as plt

# torch.manual_seed(1) # reproducible

LR = 0.01
BATCH_SIZE = 32
EPOCH = 12

# dummy dataset
x = torch.unsqueeze(torch.linspace(-1, 1, 1000), dim=1)
y = x.pow(2) + 0.1*torch.normal(torch.zeros(*x.size()))

# plot dataset
plt.scatter(x.numpy(), y.numpy())
plt.show()

Putting dateset into torch dataset:

torch_dataset = Data.TensorDataset(x, y)
loader = Data.DataLoader(dataset=torch_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=2,)

# default network
class Net(torch.nn.Module):
 def __init__(self):
 super(Net, self).__init__()
 self.hidden = torch.nn.Linear(1, 20) # hidden layer
 self.predict = torch.nn.Linear(20, 1) # output layer

def forward(self, x):
 x = F.relu(self.hidden(x)) # activation function for hidden layer
 x = self.predict(x) # linear output
 return x

if __name__ == '__main__':
 # different nets
 net_SGD = Net()
 net_Momentum = Net()
 net_RMSprop = Net()
 net_Adam = Net()
 nets = [net_SGD, net_Momentum, net_RMSprop, net_Adam]

# different optimizers
 opt_SGD = torch.optim.SGD(net_SGD.parameters(), lr=LR)
 opt_Momentum = torch.optim.SGD(net_Momentum.parameters(), lr=LR, momentum=0.8)
 opt_RMSprop = torch.optim.RMSprop(net_RMSprop.parameters(), lr=LR, alpha=0.9)
 opt_Adam = torch.optim.Adam(net_Adam.parameters(), lr=LR, betas=(0.9, 0.99))
 optimizers = [opt_SGD, opt_Momentum, opt_RMSprop, opt_Adam]

loss_func = torch.nn.MSELoss()
 losses_his = [[], [], [], []] # record loss

Training the model for various epochs:


 for epoch in range(EPOCH):
 print('Epoch: ', epoch)
 for step, (b_x, b_y) in enumerate(loader): # for each training step
 for net, opt, l_his in zip(nets, optimizers, losses_his):
 output = net(b_x) # get output for every net
 loss = loss_func(output, b_y) # compute loss for every net
 opt.zero_grad() # clear gradients for next train
 loss.backward() # backpropagation, compute gradients
 opt.step() # apply gradients
 l_his.append(loss.data.numpy()) # loss recoder

labels = ['SGD', 'Momentum', 'RMSprop', 'Adam']
 for i, l_his in enumerate(losses_his):
 plt.plot(l_his, label=labels[i])
 plt.legend(loc='best')
 plt.xlabel('Steps')
 plt.ylabel('Loss')
 plt.ylim((0, 0.2))
 plt.show()

The output of executing the preceding code block is displayed in the following plot:

The output of the Epoch count will look like this:

Epoch: 0
Epoch:  1
Epoch:  2
Epoch:  3
Epoch:  4
Epoch:  5
Epoch:  6
Epoch:  7
Epoch:  8
Epoch:  9
Epoch:  10
Epoch:  11

We will plot all the optimizers and represent them in the graph, as follows:

In the next section, we will look at RNNs.

Table of Contents for Optimization algorithms

Create new playlist

Sign In

Sign Up

Table of Contents for
Optimization algorithms