Implementation of Tacotron with Keras

In this section, we will present an implementation of Tacotron by using Keras on top of TensorFlow. The advantage of Keras over vanilla TensorFlow is that it allows for faster prototyping. This is permitted by its high modularity. However, in terms of flexibility, TensorFlow has an edge over Keras, even if it requires more effort to master it. At the moment, TensorFlow also offers more built-in functionalities (for example, attention mechanisms), some of which will have to be re-implemented here.

We will use Keras 2.1.5 with TensorFlow 1.6.0 as a backend. 

The code base is organized as follows:

  • The /data folder is meant to contain the raw dataset, and will be enhanced through several processing steps.
  • The /model folder contains the following:
    • building_blocks.py, which defines all of the essential units of the Tacotron model
    • tacotron_model.py, where the creation of the Tacotron model occurs
  • The /processing folder contains the following:
    • proc_audio.py, which provides the audio processing functions, allowing us to transform the waveforms into spectrograms
    • proc_text.py, which allows for the transformation of the raw transcripts into a more suitable format for deep learning
  • The /results folder will contain trained models, with their recorded losses.
  •  1_create_audio_dataset.py generates the training and testing audio data (model target).
  •  2_create_text_dataset.py generates the training and testing text data (model input).
  • 3_train.py trains a model by using the training data, and saves it as well as its recorded loss history.
  • 4_test.py tests the last trained model on a chosen item of the testing dataset.
  • constants.py contains all of the necessary constants.

The following are the most important Python modules used in this project. We haven't mentioned the different dependencies, since they should be automatically installed when pip install is triggered:

Module Name

Description

pandas

Used for data reading, processing, and analysis

NumPy

Provides data structures and methods for scientific computing

Scikit-learn

Contains many machine learning related processing methods

TensorFlow

Deep learning framework used as backend for Keras in this chapter

Keras

A simple and modular high-level API for the design of neural networks

Librosa

Gives processing functions for audio analysis

tqdm

Allows for the display of a progress bar to track the evolution of for - loops

Matplotlib

Can be used to visualize estimated and ground truth spectrograms and waveforms

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset