The Griffin-Lim-based postprocessing module

The purpose of the postprocessing block is to convert the predicted mel-spectrogram frame into the corresponding waveform. 

A CBHG module is used, right on top of the predicted mel-spectrogram frame, to extract both backward and forward features (thanks to the bidirectional GRU at the end), as well as to correct errors in the predicted frame. Thus, the raw spectrogram is predicted.

Even if a spectrogram is a good way to represent speech, it lacks information about the phase. Luckily, we have signal-processing algorithms such as Griffin-Lim (https://ieeexplore.ieee.org/document/1164317/), which can infer the likely speech waveform by estimating the phase from the spectrogram. It iteratively attempts to find the waveform whose STFT magnitude is closest to the generated spectrogram. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset