Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Preprocessing the dataset

The TED-LIUM release 2 dataset provides audio data in the SPH format, so we should convert it into a format that the librosa library can handle. To do this, run the following command in the asset/data directory to convert the SPH format into the WAV format:

find -type f -name '*.sph' | awk '{printf "sox -t sph %s -b 16 -t wav %s
", $0, $0".wav" }' | bash

If you don't have sox installed, please install it first.

We found that the main bottleneck is the disk read time when training because of the size of the audio files. It is better to have smaller audio files before processing for faster execution. So, we have decided to preprocess the whole audio data into the MFCC feature files, which are much smaller. Additionally, we highly recommend using a solid-state drive (SSD) instead of a hard drive.

Run the following command in the console to preprocess the whole dataset:

python preprocess.py

With the processed audio files, we can now train the network.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Preprocessing the dataset

Create new playlist

Sign In

Sign Up

Table of Contents for
Preprocessing the dataset