Implementation of Tacotron with Keras

In this section, we will present an implementation of Tacotron by using Keras on top of TensorFlow. The advantage of Keras over vanilla TensorFlow is that it allows for faster prototyping. This is permitted by its high modularity. However, in terms of flexibility, TensorFlow has an edge over Keras, even if it requires more effort to master it. At the moment, TensorFlow also offers more built-in functionalities (for example, attention mechanisms), some of which will have to be re-implemented here.

We will use Keras 2.1.5 with TensorFlow 1.6.0 as a backend.

The code base is organized as follows:

The /data folder is meant to contain the raw dataset, and will be enhanced through several processing steps.
The /model folder contains the following:
- building_blocks.py, which defines all of the essential units of the Tacotron model
- tacotron_model.py, where the creation of the Tacotron model occurs
The /processing folder contains the following:
- proc_audio.py, which provides the audio processing functions, allowing us to transform the waveforms into spectrograms
- proc_text.py, which allows for the transformation of the raw transcripts into a more suitable format for deep learning
The /results folder will contain trained models, with their recorded losses.
1_create_audio_dataset.py generates the training and testing audio data (model target).
2_create_text_dataset.py generates the training and testing text data (model input).
3_train.py trains a model by using the training data, and saves it as well as its recorded loss history.
4_test.py tests the last trained model on a chosen item of the testing dataset.
constants.py contains all of the necessary constants.

The following are the most important Python modules used in this project. We haven't mentioned the different dependencies, since they should be automatically installed when pip install is triggered:

Module Name	Description
pandas	Used for data reading, processing, and analysis
NumPy	Provides data structures and methods for scientific computing
Scikit-learn	Contains many machine learning related processing methods
TensorFlow	Deep learning framework used as backend for Keras in this chapter
Keras	A simple and modular high-level API for the design of neural networks
Librosa	Gives processing functions for audio analysis
tqdm	Allows for the display of a progress bar to track the evolution of for - loops
Matplotlib	Can be used to visualize estimated and ground truth spectrograms and waveforms

Table of Contents for Implementation of Tacotron with Keras

Create new playlist

Sign In

Sign Up

Table of Contents for
Implementation of Tacotron with Keras