The Adam optimizer (https://arxiv.org/abs/1412.6980) is used with a particular learning rate schedule. Indeed, the learning rate is initialized at 0.001, and is then reduced to 0.0005, 0.0003, and 0.0001, after 500,000, 1 million, and 2 million global steps. A step is one gradient update. It shouldn't be confused with an epoch, which is a full cycle of gradient updates across the entire training dataset.
L1 loss is used for both the encoder-decoder that predicts the mel-spectrograms and the postprocessing block that predicts the spectrograms.