One of the major applications of the HMM is in the field of speech recognition. In this section, we will briefly describe the process of speech recognition.
In speech recognition, our job is to compute the most probable word corresponding to a speech signal or acoustic observation. Our aim is to compute the following:
Here, O corresponds to the acoustic observation and W is the set of all possible words. The likelihood is determined by an acoustic model, and the prior P(W) is determined by a language model.
Fig 7.14 shows the architecture of an HMM-based speech recognition system. There are three major components:
The basic units of sound represented by the acoustic model are the phonetics. For example, the word "bat" is composed of three phonetics, /b/ /ae/ /t/. About 40 such phonetics are required for English. Each spoken letter W can be decomposed into a sequence of base phonetics. This sequence is called its pronunciation. Thus, a word can be represented by an HMM, with hidden state variables being the base phonetics. For example, the HMM for the word bat is as follows:
So, with the proper definition of the transition matrix A, the initial state probability distribution , and the emission probability , we can compute the value of using the forward algorithm, as discussed in the previous sections.
The language model provides context to distinguish between words and phrases that sound similar. For example, the phrases "recognize speech" and "wreck a nice beach" may be pronounced the same but mean very different things. These ambiguities are easier to resolve when evidence from the language model is incorporated with the pronunciation dictionary and the acoustic model. Further, they also help in faster speech recognition by restricting the search space to the most probable words rather than all possible words. Generally, the N-gram language model is used in most speech recognition applications, where the prior probability of a word sequence is computed as follows:
Thus, to build speech recognition, we must perform the following steps: