Text-to-Speech Using Tacotron

Text-to-speech (TTS) is the act of converting text into intelligible and natural speech. Before we delve into deep learning approaches to handle TTS, we should ask ourselves the following questions: what are TTS systems for? And why do we need them in the first place?

Well, there are many use cases for TTS. One of the most obvious is that it allows blind people to listen to written content. Indeed, Braille-based books, devices, or signs are not always available, and blind people can't always have someone read to them. In the near future, there might be smart glasses that can describe the surrounding environment and read urban signs and text-based indications to their users.

Many people struggle from childhood with learning disabilities like dyslexia. Robust TTS systems can help them on a daily basis, increasing their productivity at school or work, for instance.

Also, related to the area of learning, it is commonly proposed that different individuals have different preferred styles of absorbing knowledge. For instance, there are those that have great visual memory, those that more easily retain information they have heard, and those that rely more on their kinesthetic memory (memory associated with physical movements). TTS systems can help auditory learners take advantage of that particular way of learning.

In our increasingly fast-paced world, multitasking often becomes a necessity. It is not rare to see a person walking in the street and reading some content displayed on their smartphone at the same time. Someone might also be cooking and following recipe instructions on a touchscreen device. But what if the lack of visual attention leads to an accident (in the first scenario), and what if dirty and sticky fingers prevent an aspiring chef from scrolling down to read the rest of the recipe (in the second scenario)? Again, TTS is a natural solution to avoid these inconveniences.

As you can see, TTS applications have the potential to enhance many aspects of our everyday lives.

In this chapter, we will cover the following topics:

A quick overview of the field
A few recent deep learning approaches for TTS
A step-by-step implementation of Tacotron—an end-to-end deep learning model

Table of Contents for Text-to-Speech Using Tacotron

Create new playlist

Sign In

Sign Up

Table of Contents for
Text-to-Speech Using Tacotron