Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Applications

One of the major applications of the HMM is in the field of speech recognition. In this section, we will briefly describe the process of speech recognition.

In speech recognition, our job is to compute the most probable word corresponding to a speech signal or acoustic observation. Our aim is to compute the following:

Here, O corresponds to the acoustic observation and W is the set of all possible words. The likelihood is determined by an acoustic model, and the prior P(W) is determined by a language model.

Fig 7.14 shows the architecture of an HMM-based speech recognition system. There are three major components:

Acoustic model
Language model
Pronunciation dictionary

Fig 7.14: Architecture of an HMM-based speech recognition system

The acoustic model

The basic units of sound represented by the acoustic model are the phonetics. For example, the word "bat" is composed of three phonetics, /b/ /ae/ /t/. About 40 such phonetics are required for English. Each spoken letter W can be decomposed into a sequence of base phonetics. This sequence is called its pronunciation. Thus, a word can be represented by an HMM, with hidden state variables being the base phonetics. For example, the HMM for the word bat is as follows:

Fig 7.15: An HMM corresponding to the word "bat"

So, with the proper definition of the transition matrix A, the initial state probability distribution , and the emission probability , we can compute the value of using the forward algorithm, as discussed in the previous sections.

The language model

The language model provides context to distinguish between words and phrases that sound similar. For example, the phrases "recognize speech" and "wreck a nice beach" may be pronounced the same but mean very different things. These ambiguities are easier to resolve when evidence from the language model is incorporated with the pronunciation dictionary and the acoustic model. Further, they also help in faster speech recognition by restricting the search space to the most probable words rather than all possible words. Generally, the N-gram language model is used in most speech recognition applications, where the prior probability of a word sequence is computed as follows:

Thus, to build speech recognition, we must perform the following steps:

For each word in the vocabulary, we must build an HMM by estimating model parameters that optimize the likelihood of the training set acoustic observation for the word.
Build a language model corresponding to the vocabulary.
For each acoustic observation , we must compute the value of and select the value of v that maximizes .

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Applications

Create new playlist

Sign In

Sign Up

Applications

The acoustic model

The language model

Table of Contents for
Applications