Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

A. Satapathi, A. MishraDeveloping Cloud-Native Solutions with Microsoft Azure and .NET https://doi.org/10.1007/978-1-4842-9004-0_9

9. Build a Desktop Application for Speech-to-Text Conversation Using Azure Cognitive Services

Ashirwad Satapathi¹ and Abhishek Mishra²

(1)

Gajapati, Odisha, India

(2)

Navi MUmbai, India

Modern applications using artificial intelligence (AI). For example, you can build a healthcare application that can help medical practitioners and doctors dictate the drugs prescription for the patient and the AI-based application will convert the doctor’s verbal dictation into a text-based prescription that the patient can use to procure drugs for the treatment. Building an AI-based application from scratch can be challenging. You need to develop your AI model on top of a huge amount of data. However, public cloud providers provide Platform-as-a-Service (PaaS)-based AI services that you can consume to build modern AI-based applications. The cloud providers take care of the data and model. You simply need to pay for the data you use.

In this chapter we will explore Azure Cognitive Services and how to use its Speech service to convert speech to text.

Structure

In this chapter, we will explore the following topics related to Azure Cognitive Services:

Introduction to Azure Cognitive Services
Provision the Speech service
Build a .NET-based desktop application to convert speech to text

Objectives

After studying this chapter, you should be able to

Understand the fundamentals of Azure Cognitive Services
Work with the Speech service

Introduction to Azure Cognitive Services

Azure Cognitive Services provides PaaS-based artificial intelligence capability for developing AI-based applications. You need not arrange any dataset nor train any model. You simply need to consume these services for your AI use cases. Under the hood, Azure has done the heavy lifting in training the models and exposing these trained models as services that you can consume without concern for the underlying infrastructure. All these services are exposed as REST APIs that you can consume or use SDKs available in popular languages and platforms like .NET, Java, and Python.

The following are the offerings from Azure Cognitive Services:

Vision
Speech
Language
Decision

Vision

Vision comprises the following services:

Computer Vision service helps you with capability to process and extract insights from videos and images.
Custom Vision helps you build your own custom image classifiers and deploy them on Azure. You can apply labels to the images based on specific characteristics of the images.
Face service helps you in performing face recognition.

Speech

Speech service helps you build intelligent applications that can convert speech audio to text and vice versa.

Language

Language comprises the following services:

Language Understanding Intelligent Service (LUIS) helps you perform natural language processing and helps the applications understand human natural language.
Translator translates machine-based text from one language to another.
Language Service helps you analyze text and derive insights like sentiments and key phrases from the text.
QnA Maker helps you build a question-and-answer database from your semi-structured data.

Decision

The following are the services offered by Decision APIs:

Anomaly Detector helps you infer anomalies in any time-series data.
Content Moderator helps you build applications that can moderate data that can be offensive or risky.
Personalizer helps you capture real-time user personal preferences that will help you understand user behaviors.

Provision Speech Service

Let’s spin up a Speech service that we can use in a .NET application to convert speech to text. Go to the Azure portal and click Create a resource as shown in Figure 9-1.

You will be navigated to the Azure Marketplace. Click AI + Machine Learning and then click Speech as shown in Figure 9-2.

Provide the basic details like name, subscription, resource group, pricing tier, and region for the Speech service, as shown in Figure 9-3, and then click Create + review.

Click Create as shown in Figure 9-4. This will spin up the Speech service.

Once the Speech service gets created, go to the Keys and Endpoints section and click Show Keys as shown in Figure 9-5. Copy the value in the KEY 1 field. We will use this key while consuming the service from the .NET desktop application.

Build a .NET-Based Desktop Application to Convert Speech to Text

Let’s build a .NET-based desktop application that will convert speech to text using the Speech service we created earlier. Open Visual Studio and click Create a new project as shown in Figure 9-6.

Search for Windows and click the Windows Forms App template as shown in Figure 9-7.

Figure 9-7
Select the first Windows Forms App template

Provide the details for the project as shown in Figure 9-8 and click Next.

Select the .NET framework version as shown in Figure 9-9 and click Create. This will create the Windows Forms application project.

Design a form that will take the full path along with the name of the video file. It should have a button to invoke the Speech service and convert the audio speech into text. It should also have a label to display text for the converted speech. Figure 9-10 represents the form design.

Go to the Form1.cs file and add the code shown in Listing 9-1 for the button click event. You invoke the Speech service to convert a wav format file to text. You display the text in the label you have added.

private async void btnConvert_Click(object sender, EventArgs e)

{

string key = "[Provide Speech service Key]";

string region = "[Provide Speech service location]";

var speechCfg = SpeechConfig.FromSubscription(key, region);

speechCfg.SpeechRecognitionLanguage = "en-US";

using var audioToConvert = AudioConfig.FromWavFileInput(txtWMVFile.Text);

using var speechCoversionOutput = new SpeechRecognizer(speechCfg, audioToConvert);

var speechConversionResult = await speechCoversionOutput.RecognizeOnceAsync();

lblOutput.Text = lblOutput.Text + " " + speechConversionResult.Text;

}

Listing 9-1

Form1.cs

Run the code. Provide the wav file along with the fully qualified path and click Convert as shown in Figure 9-11.

The converted text from the speech will get displayed as shown in Figure 9-12.

Summary

In this chapter, we explored the basic concepts of Azure Cognitive Services. Then we created a speech service and invoked the Speech service from a .NET-based desktop application to convert a wav audio file speech to text. In the next chapter, you will learn how to build a multilanguage text translator using Azure Cognitive Services and .NET.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 9. Build a Desktop Application for Speech-to-Text Conversation Using Azure Cognitive Services

Create new playlist

Sign In

Sign Up