We use a neural network with only a single layer for predicting the output:
a = np.matmul(X, theta)
YHat = sigmoid(a)
So, we use Meta-SGD for finding this optimal parameter value theta, and learning rate and gradient update direction that's generalizable across tasks. So, for a new task, we can learn from a few data points in less time by taking fewer gradient steps.