We first need to compute the style, or Gram matrix, by computing the matrix of dot products from the unrolled filter matrix.
The style loss for the hidden layer a can be represented as the following:
We want to minimize the distance between the Gram matrices for the images S and G. The overall weighted style loss (which we want to minimize) is represented as the following:
Here, λ represents the weights for different layers. Bear the following in mind:
- The style of an image can be represented using the Gram matrix of a hidden layer’s activations. However, we get even better results combining this representation from multiple different layers. This is in contrast to the content representation, where usually using just a single hidden layer is sufficient.
- Minimizing the style cost will cause the image G to follow the style of the image S.