Applications

One of the major applications of the HMM is in the field of speech recognition. In this section, we will briefly describe the process of speech recognition.

In speech recognition, our job is to compute the most probable word corresponding to a speech signal or acoustic observation. Our aim is to compute the following:

Applications

Here, O corresponds to the acoustic observation and W is the set of all possible words. The likelihood Applications is determined by an acoustic model, and the prior P(W) is determined by a language model.

Fig 7.14 shows the architecture of an HMM-based speech recognition system. There are three major components:

  • Acoustic model
  • Language model
  • Pronunciation dictionary
Applications

Fig 7.14: Architecture of an HMM-based speech recognition system

The acoustic model

The basic units of sound represented by the acoustic model are the phonetics. For example, the word "bat" is composed of three phonetics, /b/ /ae/ /t/. About 40 such phonetics are required for English. Each spoken letter W can be decomposed into a sequence of The acoustic model base phonetics. This sequence is called its pronunciation. Thus, a word can be represented by an HMM, with hidden state variables being the base phonetics. For example, the HMM for the word bat is as follows:

The acoustic model

Fig 7.15: An HMM corresponding to the word "bat"

So, with the proper definition of the transition matrix A, the initial state probability distribution The acoustic model, and the emission probability The acoustic model, we can compute the value of The acoustic model using the forward algorithm, as discussed in the previous sections.

The language model

The language model provides context to distinguish between words and phrases that sound similar. For example, the phrases "recognize speech" and "wreck a nice beach" may be pronounced the same but mean very different things. These ambiguities are easier to resolve when evidence from the language model is incorporated with the pronunciation dictionary and the acoustic model. Further, they also help in faster speech recognition by restricting the search space to the most probable words rather than all possible words. Generally, the N-gram language model is used in most speech recognition applications, where the prior probability of a word sequence The language model is computed as follows:

The language model

Thus, to build speech recognition, we must perform the following steps:

  1. For each word The language model in the vocabulary, we must build an HMM The language model by estimating model parameters that optimize the likelihood of the training set acoustic observation for the The language model word.
  2. Build a language model corresponding to the vocabulary.
  3. For each acoustic observation The language model, we must compute the value of The language model and select the value of v that maximizes The language model.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset