© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
A. Satapathi, A. MishraDeveloping Cloud-Native Solutions with Microsoft Azure and .NET https://doi.org/10.1007/978-1-4842-9004-0_9

9. Build a Desktop Application for Speech-to-Text Conversation Using Azure Cognitive Services

Ashirwad Satapathi1   and Abhishek Mishra2
(1)
Gajapati, Odisha, India
(2)
Navi MUmbai, India
 

Modern applications using artificial intelligence (AI). For example, you can build a healthcare application that can help medical practitioners and doctors dictate the drugs prescription for the patient and the AI-based application will convert the doctor’s verbal dictation into a text-based prescription that the patient can use to procure drugs for the treatment. Building an AI-based application from scratch can be challenging. You need to develop your AI model on top of a huge amount of data. However, public cloud providers provide Platform-as-a-Service (PaaS)-based AI services that you can consume to build modern AI-based applications. The cloud providers take care of the data and model. You simply need to pay for the data you use.

In this chapter we will explore Azure Cognitive Services and how to use its Speech service to convert speech to text.

Structure

In this chapter, we will explore the following topics related to Azure Cognitive Services:
  • Introduction to Azure Cognitive Services

  • Provision the Speech service

  • Build a .NET-based desktop application to convert speech to text

Objectives

After studying this chapter, you should be able to
  • Understand the fundamentals of Azure Cognitive Services

  • Work with the Speech service

Introduction to Azure Cognitive Services

Azure Cognitive Services provides PaaS-based artificial intelligence capability for developing AI-based applications. You need not arrange any dataset nor train any model. You simply need to consume these services for your AI use cases. Under the hood, Azure has done the heavy lifting in training the models and exposing these trained models as services that you can consume without concern for the underlying infrastructure. All these services are exposed as REST APIs that you can consume or use SDKs available in popular languages and platforms like .NET, Java, and Python.

The following are the offerings from Azure Cognitive Services:
  • Vision

  • Speech

  • Language

  • Decision

Vision

Vision comprises the following services:
  • Computer Vision service helps you with capability to process and extract insights from videos and images.

  • Custom Vision helps you build your own custom image classifiers and deploy them on Azure. You can apply labels to the images based on specific characteristics of the images.

  • Face service helps you in performing face recognition.

Speech

Speech service helps you build intelligent applications that can convert speech audio to text and vice versa.

Language

Language comprises the following services:
  • Language Understanding Intelligent Service (LUIS) helps you perform natural language processing and helps the applications understand human natural language.

  • Translator translates machine-based text from one language to another.

  • Language Service helps you analyze text and derive insights like sentiments and key phrases from the text.

  • QnA Maker helps you build a question-and-answer database from your semi-structured data.

Decision

The following are the services offered by Decision APIs:
  • Anomaly Detector helps you infer anomalies in any time-series data.

  • Content Moderator helps you build applications that can moderate data that can be offensive or risky.

  • Personalizer helps you capture real-time user personal preferences that will help you understand user behaviors.

Provision Speech Service

Let’s spin up a Speech service that we can use in a .NET application to convert speech to text. Go to the Azure portal and click Create a resource as shown in Figure 9-1.
Figure 9-1

Click Create a resource

You will be navigated to the Azure Marketplace. Click AI + Machine Learning and then click Speech as shown in Figure 9-2.
Figure 9-2

Click Speech

Provide the basic details like name, subscription, resource group, pricing tier, and region for the Speech service, as shown in Figure 9-3, and then click Create + review.
Figure 9-3

Click Review + create

Click Create as shown in Figure 9-4. This will spin up the Speech service.
Figure 9-4

Click Create

Once the Speech service gets created, go to the Keys and Endpoints section and click Show Keys as shown in Figure 9-5. Copy the value in the KEY 1 field. We will use this key while consuming the service from the .NET desktop application.
Figure 9-5

Copy the key in KEY 1 field

Build a .NET-Based Desktop Application to Convert Speech to Text

Let’s build a .NET-based desktop application that will convert speech to text using the Speech service we created earlier. Open Visual Studio and click Create a new project as shown in Figure 9-6.
Figure 9-6

Click Create a new project

Search for Windows and click the Windows Forms App template as shown in Figure 9-7.
Figure 9-7

Select the first Windows Forms App template

Provide the details for the project as shown in Figure 9-8 and click Next.
Figure 9-8

Provide project details

Select the .NET framework version as shown in Figure 9-9 and click Create. This will create the Windows Forms application project.
Figure 9-9

Select .NET version

Design a form that will take the full path along with the name of the video file. It should have a button to invoke the Speech service and convert the audio speech into text. It should also have a label to display text for the converted speech. Figure 9-10 represents the form design.
Figure 9-10

Form design

Go to the Form1.cs file and add the code shown in Listing 9-1 for the button click event. You invoke the Speech service to convert a wav format file to text. You display the text in the label you have added.
private async void btnConvert_Click(object sender, EventArgs e)
        {
            string key = "[Provide Speech service Key]";
            string region = "[Provide Speech service location]";
            var speechCfg = SpeechConfig.FromSubscription(key, region);
            speechCfg.SpeechRecognitionLanguage = "en-US";
            using var audioToConvert = AudioConfig.FromWavFileInput(txtWMVFile.Text);
            using var speechCoversionOutput = new SpeechRecognizer(speechCfg, audioToConvert);
            var speechConversionResult = await speechCoversionOutput.RecognizeOnceAsync();
            lblOutput.Text = lblOutput.Text + " " + speechConversionResult.Text;
        }
Listing 9-1

Form1.cs

Run the code. Provide the wav file along with the fully qualified path and click Convert as shown in Figure 9-11.
Figure 9-11

Provide wav file to convert

The converted text from the speech will get displayed as shown in Figure 9-12.
Figure 9-12

Converted output

Summary

In this chapter, we explored the basic concepts of Azure Cognitive Services. Then we created a speech service and invoked the Speech service from a .NET-based desktop application to convert a wav audio file speech to text. In the next chapter, you will learn how to build a multilanguage text translator using Azure Cognitive Services and .NET.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset