Preprocessing the dataset

The TED-LIUM release 2 dataset provides audio data in the SPH format, so we should convert it into a format that the librosa library can handle. To do this, run the following command in the asset/data directory to convert the SPH format into the WAV format:

find -type f -name '*.sph' | awk '{printf "sox -t sph %s -b 16 -t wav %s
", $0, $0".wav" }' | bash

If you don't have sox installed, please install it first.

We found that the main bottleneck is the disk read time when training because of the size of the audio files. It is better to have smaller audio files before processing for faster execution. So, we have decided to preprocess the whole audio data into the MFCC feature files, which are much smaller. Additionally, we highly recommend using a solid-state drive (SSD) instead of a hard drive.

Run the following command in the console to preprocess the whole dataset:

python preprocess.py

With the processed audio files, we can now train the network.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset