Speech recordings dataset

We will be utilizing the speech recordings of the Linguistic Data Consortium (LDC), which is available from Kaggle. You can download the dataset from https://www.kaggle.com/nltkdata/timitcorpus with an account in Kaggle. The data consists of free speech recordings of different speakers. While the original dataset is quite huge (several gigabytes), the data from Kaggle is a small subset that we can use for training within a reasonable time. Note that speech to text requires a large amount of transcribed audio data that may take several hours or days to train to get a model with good, meaningful transcriptions. You can use the same model we build here on a larger data to achieve better speech to text accuracy. For the complete code in this section, you can refer to the Jupyter Notebook that can be found under Chapter11/02_example.ipynb of this book's code repository. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset