Data preparation

To be able to train Tacotron, we need to apply several preprocessing steps on this dataset. We have to prepare the normalized text data in metadata.csv, so that it has the proper shape to be used as the input of the encoder. Also, we should extract the mel and magnitude spectrograms that will be output by the decoder and the postprocessing CBHG module, respectively.

The data can be loaded with the read_csv pandas. We need to take into account the fact that the CSV file does not contain any header, uses the pipe character to separate the columns, and contains quotation marks that are not always closed (the transcripts are not always full sentences):

metadata = pd.read_csv('data/LJSpeech-1.1/metadata.csv',
                       dtype='object', quoting=3, sep='|',
                       header=None)

We decided to use 90% of the data (11,790 million items) for training, and keep the remaining 10% for testing (1,310 items). This is an arbitrary choice, and we will define a variable, TRAIN_SET_RATIO, that can be tuned by the reader:

TRAIN_SET_RATIO = 0.9

Table of Contents for Data preparation

Create new playlist

Sign In

Sign Up

Table of Contents for
Data preparation