Details of the architecture

The Adam optimizer (https://arxiv.org/abs/1412.6980)  is used with a particular learning rate schedule. Indeed, the learning rate is initialized at 0.001, and is then reduced to 0.0005, 0.0003, and 0.0001, after 500,000, 1 million, and 2 million global steps. A step is one gradient update. It shouldn't be confused with an epoch, which is a full cycle of gradient updates across the entire training dataset. 

L1 loss is used for both the encoder-decoder that predicts the mel-spectrograms and the postprocessing block that predicts the spectrograms. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset