Processing speech request messages

Once we receive a message, specifically a speech request message, we need to process it. This message is telling us to use the text to speech engine to repeat vocally what was presented to us as text. We will assume that the validity of the message contents was checked on the sending side.

First, we will set the voice to be male or female, based upon the maleSpeaker flag in the message. We then set the volume and rate, pass in the text of the message, and let the TTS engine play back the audio:

bool ProcessSpeechRequestMessage(SpeechRequestMessage msg)
{
WriteLineInColor("Received Speech Bot Request", ConsoleColor.Red);
WriteLineInColor("Text to speak: " + msg.text, ConsoleColor.Yellow);
voice?.SelectVoiceByHints(msg.maleSpeaker == 1 ? VoiceGender.Male : VoiceGender.Female);
Ensure.Argument(msg.volume).GreaterThanOrEqualTo(0);
Ensure.Argument(msg.volume).LessThanOrEqualTo(100);
Ensure.Argument(msg.rate).GreaterThanOrEqualTo(-10);
Ensure.Argument(msg.rate).LessThanOrEqualTo(10);
voice.Volume = msg.volume;
voice.Rate = msg.rate;
PromptBuilder builder = new PromptBuilder();
builder.ClearContent();
builder.StartSentence();
builder.AppendText(msg?.text);
builder.EndSentence();
voice.SpeakAsync(builder);
return true;
}

The volume number ranges from zero to one hundred inclusive, where 100 is the maximum value and 0 is the minimum.

The rate number ranges from -10 to 10. A value of zero sets the voice to talk at its default pitch. A value of -10 sets the voice to speak at one-third of its default rate. A value of 10 sets the voice to speak at three times its default rate. Values outside this range will be passed to the TTS engine; however, the operating characteristics are undefined and vary by the voice. Each increment between -10 and 10 is logarithmically distributed such that incrementing or decrementing by one is multiplying or dividing the rate by the 10^th root of three (about 1.1). Values more extreme than -10 and 10 will be passed to an engine. However, TTS engines that comply with the speech platform may not support such extremes and may clip the rate to the maximum or minimum rate the engine supports.

The TTS API offers two events that we need to subscribe to. They are SpeakStarted and SpeakCompleted. Once we execute the voice.SpeakAsync call, the SpeakStarted event will be fired. Once speaking is complete, the SpeakCompleted event will be fired. You can use these two events to track when your microservice is speaking. We could send status messages to make sure that other people do not send us requests until we are capable of responding. The other usage for these events is internal to our microservice, in that we could show and hide our dialog screen, and many other things. For the purposes of this demonstration, we will just set a Boolean flag to indicate whether or not we are actually speaking at the moment:

private bool speaking;
private void Voice_SpeakStarted(object sender, SpeakStartedEventArgs e)
{
speaking = true;
}
private void Voice_SpeakCompleted(object sender, SpeakCompletedEventArgs e)
{
speaking = false;
}

Table of Contents for Processing speech request messages

Create new playlist

Sign In

Sign Up

Table of Contents for
Processing speech request messages