We use a neural network with a single layer for predicting the output:
a = np.matmul(X, theta)
YHat = sigmoid(a)
So, we use ADML for finding this optimal parameter value θ that is generalizable across tasks. So, for a new task, we can learn from a few data points in less time by taking fewer gradient steps.