For simplicity and better understanding, we use a neural network with only a single layer for predicting the output:
a = np.matmul(X, theta)
YHat = sigmoid(a)
So, we use gradient agreement with MAML to find this optimal parameter value theta that is generalizable across tasks. This is so that, for a new task, we can learn from a few data points in less time by taking fewer gradient steps.