Learning to learn gradient descent by gradient descent: https://arxiv.org/pdf/1606.04474.pdf Optimization as a model for few-shot learning setting: https://openreview.net/pdf?id=rJY0-Kcll